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(57) Abstract 



A synthetic nucleic acid sequence and a method are disclosed for selectively expressing a protein in a target cell or tissue of a 
mammal. Selective expression is effected by replacing at least one existing codon of a parent nucleic acid sequence encoding a protein of 
interest with a synonymous codon to produce a synthetic nucleic acid sequence having altered translational kinetics compared to the parent 
nucleic acid sequence. The synonymous codon is selected such that it corresponds to an iso-tRNA which, when compared to an iso-tRNA 
corresponding to the at least one existing codon, is in higher abundance in the target cell or tissue relative to one or more other cells or 
tissues of the mammal. 
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TTTLE 

^'NUCLEIC ACID SEQUENCE AND METHOD FOR SELECTIVELY 
EXPRESSING A PROTEIN IN A TARGET CELL OR TISSUE" 

5 

FIELD OF THE INVENTION 

THIS INVENTION relates generally to gene 
therapy* More particularly, the present invention 

10 relates to a synthetic nucleic acid sequence and to a 
method for selectively expressing a protein in a 
target cell or tissue in which at least one existing 
codon of a parent nucleic acid sequence encoding the 
protein has been replaced with a synonymous codon. 

15 The invention also relates to production of virus 
particles using one or more synthetic nucleic acid 
sequences and the method according to the invention. 

BACKGROU ND OF T^E I NVENTION 

20 

While gene therapy is of great clinical 
interest for treatment of gene defects, this therapy 
has not entered into mainstream clinical practice 
because selective delivery of genes to target tissues 

25 has proven extremely difficult. Currently, viral 
vectors are used, particularly retroviruses and 
adenovirus, which are to some extent selective. 
However, many vector systems are by their nature 
unable to produce stable integrants and some also 

30 invoke immune responses thereby preventing effective 
treatment. Alternatively, "naked" DNA is packaged in 
liposomes or other similar delivery systems. A major 
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problem to be overcome is that such gene delivery 
systems themselves are not tissue selective, whereas 
selective targeting of genes to particular tissues 
would be desirable for many disorders {e.g., cancer 
5 therapy) . While use of tissue specific promoters to 
target gene therapy has been effective in some animal 
models it has proven less so in man, and selective 
tissue specific promoters are not available for a wide 
range of tissues. 

10 The current invention has arisen 

unexpectedly from recent investigations exploring why 
papillomavirus (PV) late gene expression is restricted 
to differentiated keratinocytes . In this regard, it 
is known that PV late genes LI and L2 are only 

15 expressed in non-dividing differentiated keratinocytes 
(KCs) . Many investigators including the present 
inventors have been unable to detect significant PV LI 
and L2 protein expression when these genes are 
transduced or transfected into undifferentiated 

20 cultured cells, using a range of conventional 
constitutive viral promoters including retroviral long 
terminal repeats (LTRs) and the strong constitutive 
promoters of CMV and SV40. 

PV LI mRNA can however be efficiently 

25 translated in vitro using rabbit reticulocyte cell 
lysate, suggesting that there are no cellular 
inhibitors in the lysate interfering with translation 
of LI. The major difference between the in vitro and 
in vivo translation systems is that LI comprises the 

30 dominant LI mRNA in in vitro translation reactions, 
while it constitutes a minor fraction among the 
cellular mRNAs in intact cells. 
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In vivo, PV late proteins are not produced 
in undifferentiated KC. However, they are expressed 
in large quantity in highly differentiated KC. The 
mechanism of this tight control of late gene 
5 expression has been poorly understood, and searches by 
many groups for KC specific PV gene transcriptional 
control proteins have been unrewarding. 

Blockage to translation of LI mRNA in vivo 
has been attributed to sequences within the LI ORF 

10 (Tan et al . 1995, J. Virol. 69 5607-5620; Tan and 
Schwartz, 1995, J. Virol. 69 2932-2945) . By using a 
Rev and Rev- responsive element of HIV, such inhibition 
could be overcome (Tan et al. 1995, supra) . 
Accordingly, the inventors examined whether removal of 

15 putative "inhibitory sequences" in the LI ORF would 
allow production of LI protein in undifferentiated 
cells. Deletion mutagenesis of BPV LI to remove 
putative inhibitory sequences and expression of 
resultant deletion mutants in CV-1 cells revealed 

20 surprisingly that despite expression of LI mRNA, LI 
protein could not be detected. 

In view of the foregoing, it has been 
difficult hitherto to understand how papillomaviruses 
produce large amounts of LI protein in the late stage 

25 of their life cycle using this apparently 
"untranslatable" gene . 

Surprisingly, however, it has now been 
discovered that PV LI protein can be produced at 
substantially enhanced levels in an undifferentiated 

30 host cell by replacing existing codons of a native LI 
gene with synonymous codons used at relatively high 
frequency by genes of the undifferentiated host cell 
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compared to the existing codons. It has also been 
found unexpectedly that there are substantial 
differences in the relative abundance of particular 
isoaccepting transfer RNAs (tRNAs) in different cells 
5 or tissues and this plays a pivotal role in protein 
expression from a gene with a given codon usage or 
composition. This discovery has been reduced to 
practice in synthetic nucleic acid sequences and 
generic methods, which utilize codon alteration as a 
10 means for targeting expression of a protein to 
particular cells or tissues or alternatively, to cells 
in a specific state of differentiation, 

QBJgCT or THg INVENTION 

15 

It is therefore an object of the present 
invention to provide a synthetic nucleic acid sequence 
and a method for selectively expressing a protein in a 
target cell or tissue which sequence and method 
20 ameliorate at least some of the disadvantages- 
associated with the prior art. 

SUMMARY OF THE INVENTION 

25 Accordingly, in one aspect of the 

invention, there is provided a synthetic nucleic acid 
sequence capable of selectively expressing a protein 
in a target cell or tissue of a mammal, wherein said 
selective expression is effected by replacing at least 

3 0 one existing codon of a parent nucleic acid sequence 
with a synonymous codon to form said synthetic nucleic 
acid sequence. 
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Suitably, said synonymous codon corresponds 
to an iso-tRNA which, when compared to an iso-tRNA 
corresponding to the at least one existing codon, is 
in higher abundance in the target cell or tissue 
5 relative to one or more other cells or tissues of the 
mammal . 

Preferably, said synonymous codon 
corresponds to an iso-tRNA which, when compared to an 
iso-tRNA corresponding to the at least one existing 

10 codon, is in higher abundance in the target cell or 
tissue relative to a precursor cell or tissue. 

Alternatively, said synonymous codon 
corresponds to an iso-tRNA which, when compared to an 
iso-tRNA corresponding to the at least one existing 

15 codon, is in higher abundance in the target cell or 
tissue relative to a cell or tissue derived therefrom. 

Advantageously, said corresponding iso-tRNA 
in said target cell or tissue is at a level which is 
at least 110%, preferably at least 200%, more 

20 preferably at least 500%, and most preferably at least 
1000%, of that expressed in the or each other cell or 
tissue of the mammal. 

Alternatively, the synonymous codon may be 
selected from the group consisting of (1) a codon used 

25 at relatively high frequency by genes, preferably 
highly expressed genes, of the target cell or tissue, 
(2) a codon used at relatively high frequency by 
genes, preferably highly expressed genes, of the or 
each other cell or tissue, (3) a codon used at 

30 relatively high frequency by genes, preferably highly 
expressed genes, of the mammal, (4) a codon used at 
relatively low frequency by genes of the target cell 
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or tissue, (5) a codon used at relatively low 
frequency by genes of the or each other cell or 
tissue, (6) a codon used at relatively low frequency 
by genes of the mammal, (7) a codon used at relatively 
5 high frequency by genes of another organism, and (8) a 
codon used at relatively low frequency by genes of 
another organism. 

In a preferred embodiment, the at least one 
existing codon and the synonymous codon are preferably 

10 selected such that said protein is expressed from said 
synthetic nucleic acid sequence in said target cell or 
tissue at a level which is at least 110%, preferably 
at least 200%, more preferably at least 500%, and most 
preferably at least 1000%, of that expressed from said 

15 parent nucleic acid sequence in said target cell or 
tissue. 

In another aspect, the invention resides in 
a method for selectively expressing a protein in a 
target cell or tissue of a mammal, wherein said 

20 selective expression is effected by replacing at least 
one existing codon of a parent nucleic acid sequence 
with a synonymous codon to form said synthetic nucleic 
acid sequence. 

Preferably, the method is further 

25 characterized by the steps of: 

(a) replacing at least one existing codon 
of a parent nucleic acid sequence encoding said 
protein with a synonymous codon to produce a synthetic 
nucleic acid sequence having altered translational 

30 kinetics compared to said parent nucleic acid sequence 
such that said protein is selectively expressible in 
said target cell or tissue; 
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(b) administering to the mammal and 
introducing into said target cell or tissue, or a 
precursor cell or precursor tissue thereof, said 
synthetic nucleic acid sequence operably linked to one 

5 or more regulatory nucleotide sequences; and 

(c) selectively expressing said protein 
in said target cell or tissue. 

Preferably, the method further includes, 
prior to step (a) : 

10 (i) measuring relative abundance of 

different isoacceptor transfer RNAs in said target 
cell or tissue, and in one or more other cells or 
tissues of the mammal; and 

(ii) identifying said at least one 

15 existing codon and said synonymous codon based on said 
measurement, wherein said synonymous codon corresponds 
to an iso-tRNA which, when compared to an iso-tRNA 
corresponding to the existing codon, is in higher 
abundance in said target cell or tissue relative to 

20 the or each other cell or tissue of the mammal. 

Suitably, step (ii) above is further 
characterized in that said synonymous codon 
corresponds to an iso-tRNA which, when compared to an 
iso-tRNA corresponding to the at least one existing 

25 codon, is in higher abundance in the target cell or 
tissue relative to a precursor cell or tissue. 

Alternatively, step (ii) above is further 
characterized in that said synonymous codon 
corresponds to an iso-tRNA which, when compared to an 

30 iso-tRNA corresponding to the at least one existing 
codon, is in higher abundance in the target cell or 
tissue relative to a cell or tissue derived therefrom. 
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Alternatively, the method further includes, 
prior to step (a) , identifying said at least one 
existing codon and said synonymous codon based on 
respective relative frequencies of particular codons 
5 used by genes selected from the group consisting of 
(I) genes of the target cell or tissue, (II) genes of 
the or each other cell or tissue, (III) genes of the 
mammal, and (IV) genes of another organism. 

In yet another aspect, the invention 

10 provides a method for expressing a protein in a target 
cell or tissue from a first nucleic acid sequence 
including the steps of : 

introducing into said target cell or 
tissue, or a precursor cell or precursor tissue 

15 thereof, a second nucleic acid sequence encoding at 
least one isoaccepting transfer RNA wherein said 
second nucleic acid sequence is operably linked to one 
or more regulatory nucleotide sequences, and wherein 
said at least one isoaccepting transfer RNA is 

20 normally in relatively low abundance in said target 
cell or tissue and corresponds to a codon of said 
first nucleic acid sequence. 

In a further aspect, the invention extends 
to a method for producing a virus particle in a 

25 cycling eukaryotic cell, said virus particle 
comprising at least one protein necessary for assembly 
of said virus particle, wherein said at least one 
protein is not expressed in said cell from a parent 
nucleic acid sequence at a level sufficient to permit 

30 virus assembly therein, said method including the 
steps of : 
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(a) replacing at least one existing codon 
of said parent nucleic acid sequence with a synonymous 
codon to produce a synthetic nucleic acid sequence 
having altered translational kinetics compared to said 

5 parent nucleic acid sequence such that said at least 
one protein is expressible from said synthetic nucleic 
acid sequence in said cell at a level sufficient to 
permit virus assembly therein; 

(b) introducing into said cell or a 
10 precursor thereof said synthetic nucleic acid sequence 

operably linked to one or more regulatory nucleotide 
sequences; and 

(c) expressing said at least one protein 
in said cell in the presence of other viral proteins 

15 required for assembly of said virus particle to 
thereby produce said virus particle. 

In yet a further aspect of the invention, 
there is provided a method for producing a virus 
particle in a cycling cell, said virus particle 

20 comprising at least one protein necessary for assembly 
of said virus particle, wherein said at least one 
protein is not expressed in said cell from a parent 
nucleic acid sequence at a level sufficient to permit 
virus assembly therein, and wherein at least one 

25 existing codon of said parent nucleic acid sequence is 
rate limiting for the production said at least one 
protein to said level, said method including the step 
of introducing into said cell a nucleic acid sequence 
capable of expressing therein an isoaccepting transfer 

30 RNA specific for said at least one codon. 
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BRIEF DE.qrP TTPTON OF THE DRAWINGS 

Figure lA depicts the nucleotide sequence 
(SEQ ID NO:l) and deduced amino acid sequence (SEQ ID 
5 N0:2) of BPVl LI. Amino acids (in single letter code) 
are presented below the second nucleotide of each 
codon. Mutations introduced into the genes are 
indicated above the corresponding nucleotides of the 
original sequence. Horizontal lines indicate the 

10 sites and enzymes used for cloning. This replacement 
of nucleotides resulted in a nucleic acid sequence 
encoding BPV-1 LI polypeptide with an amino acid 
sequences identical to the wild type, but having 
synonymous codons that are frequently used by 

15 mammalian genes. 

Figure IB shows the nucleotide sequence 
(SEQ ID NO: 5) and deduced amino acid sequence (SEQ ID 
NO: 6) relating to BPVl L2 ORF. Amino acids (in single 
letter code) are presented below the second nucleotide 

20 of each codon. Mutations introduced into the genes 
are indicated above the corresponding nucleotides of 
the original sequence. Horizontal lines indicate the 
sites and enzymes used for cloning. This replacement 
of nucleotides resulted in a nucleic acid sequence 

25 encoding BPV-1 L2 polypeptide with an amino acid 
sequences identical to the wild type, but having 
synonymous codons that are frequently used by 
mammalian genes. 

Figure IC depicts the nucleotide sequence 

30 (SEQ ID NO: 9) and deduced amino acid sequence (SEQ ID 
NO: 10) of green fluorescent protein (GFP) . Amino 
acids (in single letter code) are presented below the 
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second nucleotide of each codon. Mutations introduced 
into the genes are indicated above the corresponding 
nucleotides of the original sequence. Horizontal 
lines indicate the sites and enzymes used for cloning. 
5 This replacement of nucleotides resulted in a nucleic 
acid sequence encoding GFP polypeptide with an amino 
acid sequence identical to the native sequence 
modified for optimal expression in eukaryotic cells, 
but having synonymous codons that are frequently used 

10 by papillomavirus genes. 

Figure 2A shows detection of LI protein 
expressed from synthetic and wild type BPVl LI genes. 
Cos-1 cells were transfected with a synthetic LI 
expression plasmid pCDNA/HBLl, and a wild type LI 

15 expression plasmid pCDNA/BPVLlwt . The expression of 
LI was detected by immunof luorescent staining. Cells 
were fixed after 36 hrs and incubated with rabbit 
anti-BPVl LI antiserum, followed by FITC-conjugated 
goat ant i -rabbit IgG antibody. 

20 Figure 2B shows detection by Western blot 

of LI protein from Cos-1 cells transfected with 
pCDNA/HBLl and pCDNA/BPVLlwt . 

Figure 2C shows a Northern blot in which LI 
mRNA extracted from transfected cells was probed with 

25 "P-labeled probes produced from wild type LI sequence. 
The amount of mRNA loaded in respective lanes was 
examined by hybridization of the mRNA sample with a 
gapdh probe. 

Figure 3A shows detection of L2 protein 
30 expressed from synthetic and wild type BPVl L2 genes. 
Cos-1 cells were transfected with a synthetic L2 
expression plasmid pCDNA/HBL2, and a wild type L2 

SUBSTITUTE SHEET (RULE 26) 
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expression plasmid pCDNA/BPVL2wt . The expression of 
L2 was detected by immunof luorescent staining. Cells 
were fixed after 36 hrs and incubated with rabbit 
anti-BPVl L2 antiserum, followed by FITC- conjugated 
5 goat anti-rabbit IgG antibody. 

Figure 3B shows detection by Western blot 
of L2 protein from Cos-1 cells transfected with 
pCDNA/HBL2 and pCDNA/BPVL2wt . 

Figure 3C shows a Northern blot in which L2 
10 mRNA extracted from transfected cells was probed with 
^^P-labeled probes produced from wild type L2 sequence. 
The amount of mRNA loaded in respective lanes was 
examined by hybridization of the mRNA sample with a 
gapdh probe. 

15 Figure 4 shows in vitro translation of 

BPVLl sequences, wild type BPVLl (wt) or synthetic LI 
(HB) using rabbit reticulocyte lysate or wheat germ 
extract in the presence of ^^S-methionine . In the top 
panel, wt LI or HB LI plasmid DNA was added to the T7 

20 DNA polymerase -coupled in vitro translation system . 
LI protein was detected by Western blot analysis. In 
the bottom panel, the translation efficiency of wt LI 
or HB LI sequences in the presence or absence of tRNA 
was compared. Translation was carried out in rabbit 

25 reticulocyte lysate (rabbit) or wheat germ extract 
(wheat) , and samples were collected every two minutes 
starting from minute 8. Left side of lower panel 
indicates if 10"^ M bovine liver or yeast tRNA was 
supplied. 

30 Figure 5A is a schematic representation of 

plasmids used to determine L2 expression from BPV 
cryptic promoter (s). The wild type LI sequence and 

SUBSTITUTE SHEET (RULE 26) 
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most of the wild type L2 sequence were deleted from 
the BPVl genome by BamHI and Hindi I I digestion and the 
remaining BPVl sequence (in yellow) was cloned into 
pUC18 . Wild type or synthetic humanized L2 sequences 
5 (in red) were inserted into the BajriHI site of the BPVl 
genome. The position of the inserted SV40 ori 
sequence (in white) is indicated. The plasmid in 
which modified L2 was used but without SV40 ori 
sequence was also used as a control. The plasmids 
10 were transfected into Cos-1 cells and the expression 
of L2 protein was determined using BPVl L2 -specific 
polyclonal antiserum followed by FITC- linked anti 
rabbit IgG. 

Figure 5B shows expression of L2 protein 
15 from native papillomavirus promoter. The plasmids 
shown in Figure 5A were used to transfect Cos-1 cells 
and the expression of L2 protein was determined using 
BPVl L2 -specific polyclonal antiserum followed by 
FITC-linked anti rabbit IgG. A mock transfection in 
20 which the cells did not receive plasmid was used as 
control . 

Figure 6 shows expression of GFP in Cos-1 
cells transfected with wild-type gfp (wt) or a 
synthetic gfp gene carrying codons used at relatively 

25 high frequency by papillomavirus genes (p) . The mRNA 
extracted from cells transfected with gfp or P gfp was 
probed with "P- labeled gfp probe and is shown on the 
right panel, using gapdh as a reference gene. 

Figure 7 shows the expression pattern of 

30 GFP in vivo from wild- type gfp gene, or a synthetic 
gfp gene carrying codons used at relatively high 
frequency by papillomavirus genes. Using a gene gun, 

SUBSTITUTE SHEET (RULE 26) 
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mice were shot with PGFP (left panel) and GFP (right 
panel) expression plasmids encoding GFP protein. A 
transverse section of the mouse skin section shows 
where the gfp gene is expressed. Bright -field 
5 photographs of the same section where dermis (D) 
epidermis (E) are highlighted are shown to identify 
the location of fluorescence in the epidermis. Arrows 
indicate fluorescent signals. 

10 DETAILED DESgRTPTION 



The present invention arises from the 
unexpected discovery that the relative abundance of 
different isoaccepting transfer RNAs varies in 

15 different cells or tissues, or alternatively in cells 
or tissues in different states of differentiation or 
in different stages of the cell cycle, and that such 
differences may be exploited together with codon 
composition of a gene to regulate and direct 

20 expression of a protein to a particular cell or 
tissue, or alternatively to a cell or tissue in a 
specific state of differentiation or in a specific 
stage of the cell cycle. According to the present 
invention, this selective targeting is effected by 

25 replacing at least one existing codon of a parent 
nucleic acid sequence encoding the protein with a 
synonymous codon . 

Replacement of synonymous codons for 
existing codons is not new per se. In this regard, we 

30 refer to International Application Publication No WO 
96/09378 which utilizes such substitution to provide a 
method of expressing proteins of eukaryotic and viral 

SUBSTITUTE SHEET (RULE 26) 
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origin at high levels in in vitro mammalian cell 
culture systems, the main thrust of the method being 
the harvesting of such proteins. In distinct 

contrast, the present invention utilizes substitution 
5 of one or more codons in a gene for targeting 
expression of the gene to particular cells or tissues 
with the ultimate aim of facilitating gene therapy as 
described herein. 

The term ''synonymous codon" as used herein 
10 refers to a codon having a different nucleotide 
sequence to an existing codon but encoding the same 
amino acid as the existing codon. 

By ''isoaccepting transfer RNA" is meant one 
or more transfer RNA molecules that differ in their 
15 anticodon structure but are specific for the same 
amino acid. 

Throughout this specification, unless the 
context requires otherwise, the words ''comprise", 
comprises" and "comprising" will be understood to 
20 imply the inclusion of a stated integer or group of 
integers but not the exclusion of any other integer or 
group of integers. 

Selection of svnonvmous codons 

25 Determination of relative abundance of 

different tRNA species in different cells 

Advantageously, the synonymous codon 
corresponds to an iso-tRNA (iso-tRNA) which, when 
compared to an iso-tRNA corresponding to the at least 

30 one existing codon, is in higher abundance in the 
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target cell or tissue relative to one or more other 
cells or tissues of the mammal. 

Any method for determining the relative 
abundance of an iso-tRNA in two or more cells or 
5 tissues may be employed. For example, such method may 
include isolating two or more particular cells or 
tissues from a mammal, preparing an RNA extract from 
each cell or tissue which extract includes tRNA, and 
probing each extract respectively with different 
10 nucleic acid sequences each being specific for a 
particular iso-tRNA to determine the relative 
abundance of an iso-tRNA between the two or more cells 
or tissues. 

Suitable methods for isolating particular 

15 cells or tissues are well known to those of skill in 
the art. For example, one can take advantage of one 
or more particular characteristics of a cell or tissue 
to specifically isolate the cell or tissue from a 
heterogeneous population. Such characteristics 

20 include, but are not limited to, anatomical location 
of a tissue, cell density, cell size, cell morphology, 
cellular metabolic activity, cell uptake of ions such 
as Ca^*, K", and ions, cell uptake of compounds such 
as stains, markers expressed on the cell surface, 

25 cytokine expression, protein fluorescence, and 
membrane potential . Suitable methods that may be used 
in this regard include surgical removal of tissue, 
flow cytometry techniques such as fluorescence - 
activated cell sorting (FACS) , immunoaf f inity 

30 separation (e.gr., magnetic bead separation such as 
Dynabead™ separation), density separation (e.g., 
metrizamide, Percoll™, or Ficoll"* gradient 

SUBSTITUTE SHEET (RULE 26) 
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centrifugation) , and cell-type specific density 
separation (e.g., Lymphoprep™) . For example, dividing 
cells or blast cells may be separated from non- 
dividing cells or resting cells according to cell size 
5 by FACS or metrizamide gradient separation. 

Any suitable method for isolating total RNA 
from a cell or tissue may be used. Typical procedures 
contemplated by the invention are described in CURRENT 
PROTOCOLS IN MOLECULAR BIOLOGY (Ausubel, et al., eds) 

10 (John Wiley & Sons, Inc. 1997), hereby incorporated by 
reference, at page 4.2.1 through page 4.2.7. 
Preferably, techniques which favor isolation of tRNA 
are employed as, for example, described in 
Brunngraber, E.F. (1962, Biochem. Biophys. Res. 

15 Cormun. 8:1-3) which is hereby incorporated by 
reference. 

The probing of an RNA extract is suitably 
effected with different oligonucleotide sequences each 
being specific for a particular iso-tRNA. Of course 

20 it will be appreciated that for a given mammal, 
oligonucleotide sequences would need to be selected 
which hybridize specifically with particular iso-tRNA 
sequences expressed by the mammal. Such selection is 
well within the realm of one of ordinary skill in the 

25 art based a known iso-tRNA sequence- For example, in 
the case of a mouse, exemplary oligonucleotide 
sequences which may be used include those described in 
Gauss and Sprinzel (1983, Nucleic Acids Res. 11 (1)) 
hereby incorporated by reference. In this respect, 

30 the oligonucleotide sequences may be selected from the 
group consisting of: 
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5' 


- TAAGGACTGTAAGACTT - 


3' 


(SEQ 


ID 


NO: 


13) 


for 


Ala"^ 


5' 


- CGAGCCAGCCAGGAGTC - 


3' 


(SEQ 


ID 


NO: 


14) 


for 


Arg*=°* 


5' 


- CTAGATTGGCAGGi^TT- 


3' 


(SEQ 


ID 


NO: 


15) 


for 


Asn*^ 


5' 


-TAAGATATATAGATTAT- 


3' 


(SEQ 


ID 


NO: 


16) 


for 


Asp*^*^ 


5' 


- AAGTCTTAGTAGAGATT - 


3' 


(SEQ 


ID 


NO: 


17) 


for 


Cys'^ 


5' 


- TATTTCTACACAGCATT - 


3' 


(SEQ 


ID 


NO: 


18) 


for 


Glu<^ 


5' 


- CTAGGACAATAGGAATT - 


3' 


(SEQ 


ID 


NO: 


19) 


for 


Gln°^ 


5' 


-TACTCTCTTCTGGGTTT- 


3' 


(SEQ 


ID 


NO: 


20) 


for 


Gly^ 


5' 


-TGCCGTGACTCGGATTC- 


3' 


(SEQ 


ID 


NO: 


21) 


for 




5' 


- TAGAAATAAGAGGGCTT - 


3' 


(SEQ 


ID 


NO: 


22) 


for 


Ile"^ 


5' 


- TACTTTTATTTGGATTT - 


3' 


(SEQ 


ID 


NO: 


23) 


for 


Leu"* 


5' 


- TATTAGGGAGAGGATTT- 


3' 


(SEQ 


ID 


NO: 


24) 


for 




5' 


-TCACTATGGAGATTTTA- 


3' 


(SEQ 


ID 


NO: 


25) 


for 


Lys*** 


5' 


-CGCCCAACGTGGGGCTC- 


3' 


(SEQ 


ID 


NO: 


26) 


for 




5' 


-TAGTACGGGAAGGATTT- 


3' 


(SEQ 


ID 


NO: 


27) 


for 




5' 


- TGTTTATGGGATACAAT - 


3' 


(SEQ 


ID 


NO: 


28) 


for 




5' 


- TCAAGAAGAAGGAGCTA- 


3' 


(SEQ 


ID 


NO: 


29) 


for 


Pro'^"^ 


5' 


- GGGCTCGTCCGGGATTT - 


3' 


(SEQ 


ID 


NO: 


30) 


for 


Pro^^ 


5' 


- ATAAGAAAGGAAGATCG - 


3' 


(SEQ 


ID 


NO: 


31) 


for 


Ser*°^ 


5' 


-TGTCTTGAGAAGAGAAG- 


3' 


(SEQ 


ID 


NO: 


32) 


for 


Thr*"' 


5' 


- TGGTAAAAAGAGGATTT - 


3' 


(SEQ 


ID 


NO: 


33) 


for 




5' 


- TCAGAGTGTTCATTGGT - 


3' 


(SEQ 


ID 


NO: 


34) 


for 





25 Typically, the relative abundance of iso- 

tRNA species may be determined by blotting techniques 
that include a step whereby sample RNA or tRNA extract 
is immobilized on a matrix (preferably a synthetic 
membrane such as nitrocellulose) , a hybridization 

30 step, and a detection step. Northern blotting may be 
used to identify an RNA sequence that is complementary 
to a nucleic acid probe. Alternatively, dot blotting 
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and slot blotting can be used to identify 
complementary DNA/RNA or RNA/RNA nucleic acid 
sequences. Such techniques are well known by those 
skilled in the art, and have been described in 
5 Ausubel, et ai {supra) at pages 2,9.1 through 2.9.20. 

According to such methods, a sample of tRNA 
immobilized on a matrix is hybridized under stringent 
conditions to a complementary nucleotide sequence 
(such as those mentioned above) which is labeled, for 

10 example, radioactively, enzymatically or 

f luorochromatically, 

'"Stringency" as used herein, refers to the 
temperature and ionic strength conditions, and 
presence or absence of certain organic solvents, 

15 during hybridization. The higher the stringency, the 
higher will be the degree of complementarity between 
the immobilized nucleotide sequences (i.e., iso-tRNA) 
and the labeled oligonucleotide sequence. For a 
discussion of typical stringent conditions that may be 

20 used, see CURRENT PROTOCOLS IN MOLECULAR BIOLOGY supra 
at pages 2.10.1 to 2.10.16, and Sambrook et al in 
MOLECULAR CLONING. A LABORATORY MANUAL (Cold Spring 
Harbor Press, 1989), hereby incorporated by reference, 
at sections 1.101 to 1.104. 

25 While stringent washes are typically 

carried out at temperatures from about 42°C to 68°C, 
one skilled in the art will appreciate that other 
temperatures may be suitable for stringent conditions. 
Maximum hybridization typically occurs at about 20° to 

30 25° below the T^ for formation of a DNA-DNA hybrid. It 
is well known in the art that the T„ is the melting 
temperature, or temperature at which two complementary 
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nucleic acid sequences dissociate. Methods for 
estimating are well known in the art (see CURRENT 
PROTOCOLS IN MOLECULAR BIOLOGY supra at page 2.10.8). 
Maximum hybridization typically occurs at about 10"^ to 
5 15^ below the T^ for a DNA-RNA hybrid. 

Other stringent conditions are well known 
in the art. A skilled addressee will recognize that 
various factors can be manipulated to optimize the 
specificity of the hybridization. Optimization of the 

10 stringency of the final washes can serve to ensure a 
high degree of hybridization. 

Methods for detecting labeled nucleotide 
sequences hybridized to an immobilized nucleotide 
sequence are well known to practitioners in the art. 

15 Such methods include autoradiography, 

chemiluminescent , fluorescent and colorimetric 
detection. 

Advantageously, the relative abundance of 
an iso-tRNA in two or more cells or tissues may be 

20 determined by comparing the respective levels of 
binding of a labeled nucleotide sequence specific for 
the iso-tRNA to equivalent amounts of immobilized RNA 
obtained from the two or more cells or tissues. 
Similar comparisons are suitably carried out to 

25 determine the respective relative abundance of other 
iso-tRNAs in the two or more cells or tissues. One of 
ordinary skill in the art will thereby be able to 
determine a relative tRNA abundance table (see for 
example TABLE 2) for different cells or tissues. From 

30 such comparisons, one or more synonymous codons may be 
selected such that the or each synonymous codon 
corresponds to an iso-tRNA which, when compared to an 
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iso-tRNA corresponding to an existing codon of the 
parent nucleic acid sequence, is in higher abundance 
in the target cell or tissue relative to other cells 
or tissues of the mammal. 
5 Advantageously, a synonymous codon is 

selected such that its corresponding iso-tRNA in the 
target cell or tissue is at a level which is at least 
110%, preferably at least 200%, more preferably at 
least 500%, and most preferably at least 1000%, of 
10 that expressed in the or each other cell or tissue of 
the mammal . 

Suitably, synonymous codons for selective 
expression of a protein in a differentiated cell, 
preferably a differentiated keratinocyte, are selected 
15 from the group consisting of gca (Ala) , cuu (Leu) and 
cua (Leu) . 

Synonymous codons for selective expression 
of a protein in an undifferentiated cell, preferably 
an undifferentiated keratinocyte, are suitably 
20 selected from the group consisting of cga (Arg) , cci. 
(Pro) and aag (Asn) . 



Analysis of codon usage 

Alternatively, synonymous codons may be 
25 selected by analyzing the frequency at which codons 
are used by genes expressed in (i) particular cells or 
tissues, (ii) substantially all cells or tissues of 
the mammal, or (iii) an organism which may infect 
particular cells or tissues of the mammal. 
30 Codon frequency tables as well as suitable 

methods for determining frequency of codon usage in an 
organism are described, for example, in an article by 
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Sharp et al (1988, Nucleic Acids Res. 16 8207-8211) 
which is hereby incorporated by reference. 

The relative level of gene expression 
(e.g., detectable protein expression vs no detectable 
5 protein expression) can provide an indirect measure of 
the relative abundance of specific iso-tRNAs expressed 
in different cells or tissues. For example, a virus 
may be capable of propagating within a first cell or 
tissue (which may include a cell or tissue at a 

10 specific stage of differentiation) but may be 
substantially incapable of propagating in a second 
cell or tissue (which may include a cell or tissue at 
another stage of differentiation) . Comparison of the 
pattern of codon usage by genes of the virus with the 

15 pattern of codon usage by genes expressed in the 
second cell or tissue may thus provide indirectly a 
set of synonymous codons which correspond to iso-tRNAs 
expressed at relatively high abundance in the first 
cell or tissue relative to the second cell or tissue 

20 and vice versa. Simultaneously, the above comparison 
may also provide indirectly a set of synonymous codons 
which correspond to iso-tRNAs expressed at relatively 
high abundance in the second cell or tissue relative 
to the first cell or tissue. 

25 From the foregoing, a synonymous codon 

according to the invention may correspond to a codon 
including, but not limited to, (1) a codon used at 
relatively high frequency by genes, preferably highly 
expressed genes, of the target cell or tissue, (2) a 

30 codon used at relatively high frequency by genes, 
preferably highly expressed genes, of the or each 
other cell or tissue, (3) a codon used at relatively 
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high frequency by genes, preferably highly expressed 
genes, of the mammal, (4) a codon used at relatively 
low frequency by genes of the target cell or tissue, 
(5) a codon used at relatively low frequency by genes 
5 of the or each other cell or tissue, (6) a codon used 
at relatively low frequency by genes of the mammal, 
(7) a codon used at relatively high frequency by genes 
of another organism, and (8) a codon used at 
relatively low frequency by genes of another organism. 
10 For example, codons used at a relatively 

high frequency by genes, preferably highly expressed 
genes, of the mammal may be selected from the group 
consisting of: cue (Leu), cuu, (Leu), cug (Leu), uua 
(Leu) , uug (Leu) ; egg (Arg) , cgc (Arg) , aga (Arg) , agg 
15 (Arg) ; agu (Ser) , age (Ser) , ucu (Ser) , ucc (Ser) , and 
uca (Ser) . Alternatively, such codons may include auu 
(lie), auc (lie); guu (Val) , guc (Val) , gug (Val) ; acu 
(Thr) , acc (Thr) , aca (Thr) ; gcu (Ala) , gcc (Ala) , gca 
(Ala) ; cag (Glu) ; ggc (Gly) , gga (Gly) , ggg (Gly) . 
20 Codons used at a relatively low frequency 

by genes of the mammal are described, for example, in 
Sharp et al (1988, supra) . Such codons may comprise 
cua (Leu) ; cga (Arg) , cgu (Arg) ; ucg (Ser) . 
Alternatively, such codons may include aua (lie) ; gua 
25 (Val); acg (Thr); gcg (Ala); caa (Glu); ggu (Gly). 

Construction of synthetic nucleic acid sequences 

The step of replacing synonymous codons for 
existing codons may be effected by any suitable 
technique. For example, in vitro mutagenesis methods 
30 may be employed which are well known to those of skill 
in the art. Suitable mutagenesis methods are 
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described for example in the relevant sections of 
Ausubel, et al . (supra) and of Sambrook, et al., 
(supra) which are hereby incorporated by reference. 
Alternatively, suitable methods for altering DNA are 
5 set forth, for example, in U,S. Patent Nos 4,184,917, 
4,321,365 and 4,351,901, which are hereby incorporated 
by reference. Instead of in vitro mutagenesis, the 
second nucleic acid sequence may be synthesized de 
novo using readily available machinery. Sequential 

10 synthesis of DNA is described, for example, in U.S. 
Patent No 4,293,652, which is hereby incorporated by 
reference. However, it should be noted that the 
present invention is not dependent on and not directed 
to any one particular technique for replacing 

15 synonymous codons for existing codons. 

It is not necessary to replace all the 
existing codons of the parent nucleic acid sequence 
with synonymous codons each corresponding to a iso- 
tRNA expressed in relatively high abundance in the 

20 target cell compared to other cells. Increased 
expression may be accomplished even with partial 
replacement. Preferably, the replacing step affects 
5%, 10%, 15%, 20%, 25%, 30%, more preferably 35%, 40%, 
50%, 60%, 70% or more of the existing codons of the 

25 parent nucleic acid sequence. 

The parent nucleic acid sequence is 
preferably a natural gene. By "natural gene" is meant 
a gene that naturally encodes the protein. However, 
it is possible that the parent nucleic acid sequence 

30 encodes a protein that is not naturally-occurring but 
has been engineered using recombinant techniques . 
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The parent nucleic acid sequence need not 
be obtained from the mammal but may be obtained from 
any suitable source such as from a eukaryotic or 
prokaryotic organism. For example, the parent nucleic 
5 acid sequence may be obtained from another mammal or 
other animal. Alternatively, the parent nucleic acid 
sequence may be obtained from a pathogenic organism. 
In such a case, a natural host of the pathogenic 
organism is preferably a mammal. For example, the 
10 pathogenic organism may be a yeast, bacterium or 
virus . 

For example, suitable proteins which may be 
used for selective expression in accordance with the 
invention include, but are not limited to the cystic 

15 fibrosis transmembrane conductance regulator (CFTR) 
protein, and adenosine deaminase (ADA) . In the case 
of CFTR, a parent nucleic acid sequence encoding the 
CFTR protein which may be utilized to produce the 
synthetic nucleic acid sequence is described, for 

20 example, in Riordan et ai (1989, Science 245 1066- 
1073) , and in the GenBank database under Accession No. 
HUMCFTRM, which are hereby incorporated by reference. 

The term "nucleic acid sequence" as used 
herein designates mRNA, RNA, cRNA, cDNA or DNA. 

25 Regulatory nucleotide sequences which may 

be utilized to regulate expression of the synthetic 
nucleic acid sequence include, but are not limited to, 
a promoter, an enhancer, and a transcriptional 
terminator. Such regulatory sequences are well known 

30 to those of skill in the art. 

Synthetic nucleic acid sequences according 
to the invention may be operably linked to one or more 
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regulatory sequences in the form of an expression 
vector. By "vector" is meant a nucleic acid molecule, 
preferably a DNA molecule derived, for example, from a 
plasmid, bacteriophage, or mammalian or insect virus, 
5 into which a synthetic nucleic acid sequence may be 
inserted or cloned. A vector preferably contains one 
or more unique restriction sites and may be capable of 
autonomous replication in a defined host cell 
including the target cell or tissue or a precursor 

10 cell or precursor tissue thereof, or be integratable 
with the genome of the defined host such that the 
cloned sequence is reproducible. Thus, by ''expression 
vector" is meant any autonomous element capable of 
directing the synthesis of a protein. Such expression 

15 vectors are well known by practitioners in the art. 

The term ''precursor cell" as used herein 
refers to a cell that gives rise to the target cell. 

The invention also contemplates synthetic 
nucleic acid sub- sequences encoding desired portions 

20 of the protein. A nucleic acid sub- sequence encodes a 
domain of the protein having a function associated 
therewith and preferably encodes at least 10, 20, 50, 
100, 150, or 500 contiguous amino acids of the 
protein. 

25 The step of introducing the synthetic 

nucleic acid sequence into a target cell will differ 
depending on the intended use and or species, and may 
involve non-viral and viral vectors, cat ionic 
liposomes, retroviruses and adenoviruses such as, for 

30 example, described in Mulligan, R.C., (1993 Science 
260 926-932) which is hereby incorporated by 
reference. Such methods may include: 
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(i) Local application of the synthetic 
nucleic acid sequence by injection (Wolff et al., 
1990, Science 247 1465-1468, which is hereby 
incorporated by reference) , surgical implantation, 
5 instillation or any other means. This method may also 
be used in combination with local application by 
injection, surgical implantation, instillation or any 
other means, of cells responsive to the protein 
encoded by the synthetic nucleic acid sequence so as 

10 to increase the effectiveness of that treatment. This 
method may also be used in combination with local 
application by injection, surgical implantation, 
instillation or any other means, of another factor or 
factors required for the activity of said protein . 

15 (ii) General systemic delivery by 

injection of DNA, (Calabretta et al . , 1993, Cancer 
Treat. Rev. 19 169-179, which is hereby incorporated 
by reference) , or RNA, alone or in combination with 
liposomes (Zhu et al., 1993, Science 261 209-212, 

20 which is hereby incorporated by reference) , viral 
capsids or nanoparticles (Bertling et al., 1991, 
Biotech. Appl. Biochem. 13 390-405, which is hereby 
incorporated by reference) or any other mediator of 
delivery. Improved targeting might be achieved by 

25 linking the synthetic nucleic acid sequence to a 
targeting molecule (the so-called "magic bullet" 
approach employing for example, an antibody) , or by 
local application by injection, surgical implantation 
or any other means, of another factor or factors 

30 required for the activity of the protein produced from 
said synthetic nucleic acid sequence, or of cells 
responsive to said protein. 
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(iii) Injection or implantation or delivery 
by any means, of cells that have been modified ex vivo 
by transfection (for example, in the presence of 
calcium phosphate: Chen et al., 1987, Mole. Cell 
5 Biochem. 7 2745-2752, or of cat ionic lipids and 
polyamines: Rose et al., 1991, BioTech. 10 520-525, 
which articles are hereby incorporated by reference) , 
infection, injection, electroporation (Shigekawa et 
al., 1988, BioTech. 6 742-751, which is hereby 

10 incorporated by reference) or any other way so as to 
increase the expression of said synthetic nucleic acid 
sequence in those cells. The modification may be 
mediated by plasmid, bacteriophage, cosmid, viral 
(such as adenoviral or retroviral; Mulligan, 1993, 

15 Science 260 926-932; Miller, 1992, Nature 357 455-460; 
Salmons et al., 1993, Hum. Gen. Ther. 4 129-141, which 
articles are hereby incorporated by reference) or 
other vectors, or other agents of modification such as 
liposomes (Zhu et al., 1993, Science 261 209-212, 

20 which is hereby incorporated by reference) , viral 
capsids or nanoparticles (Bertling et al., 1991, 
Biotech. Appl . Biochem. 13 390-405, which is hereby 
incorporated by reference) , or any other mediator of 
modification. The use of cells as a delivery vehicle 

25 for genes or gene products has been described by Barr 
et al., 1991, Science 254 1507-1512 and by Dhawan et 
al., 1991, Science 254 1509-1512, which articles are 
hereby incorporated by reference. Treated cells may 
be delivered in combination with any nutrient, growth 

30 factor, matrix or other agent that will promote their 
survival in the treated subject. 
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In yet another aspect, the invention 
provides a pharmaceutical composition comprising the 
synthetic nucleic sequences of the invention and a 
pharmaceutical ly acceptable carrier. 
5 By ^'pharmaceutically-acceptable carrier" is 

meant a solid or liquid filler, diluent or 
encapsulating substance that may be safely used in 
systemic administration. Depending upon the 

particular route of administration, a variety of 

10 pharmaceutically acceptable carriers, well known in 
the art may be used. These carriers may be selected 
from a group including sugars, starches, cellulose and 
its derivatives, malt, gelatin, talc, calcium sulfate, 
vegetable oils, synthetic oils, polyols, alginic acid, 

15 phosphate buffered solutions, emulsifiers, isotonic 
saline, and pyrogen- free water. 

Any suitable technique may be employed for 
determining expression of the protein from said 
synthetic nucleic acid sequence in a particular cell 

20 or tissue. For example, expression can be measured 
using an antibody specific for the protein of interest 
or portion thereof. Such antibodies and measurement 
techniques are well known to those skilled in the art. 

Applications 

25 In one embodiment of the present invention, 

the target cell is suitably a differentiated cell. 
Advantageously, the protein which is desired to be 
selectively expressed in the differentiated cell is 
not expressible in a precursor cell thereof (such as 

30 an undifferentiated or less differentiated cell of the 
mammal) from a parent nucleic acid sequence at a level 
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sufficient to effect a particular function associated 
with said protein. In this embodiment, the step of 
replacing at least one existing codon with a 
synonymous codon is characterized in that the 
5 synonymous codon corresponds to an iso-tRNA which, 
when compared to the iso-tRNA corresponding to the at 
least one existing codon, is in relatively higher 
abundance in the differentiated cell compared to the 
precursor cell. Accordingly, a synthetic nucleic acid 

10 sequence is produced having altered translational 
kinetics compared to the parent nucleic acid sequence 
wherein the protein is expressible in the 
differentiated cell at a level sufficient to effect a 
particular function associated with said protein, but 

15 wherein the protein is not expressible in the 
precursor cell at a level sufficient to effect said 
function. 

As used herein, the term ''function" refers 
to a biological, or therapeutic function. 

20 The above embodiment may be utilized 

advantageously for somatic gene therapy where 
overexpression of a protein in undifferentiated cells 
such as stems cells has undesirable consequences 
including death or differentiation of the stem cells. 

25 In such a case, a suitable protein may include cystic 
fibrosis transmembrane conductance regulator (CFTR) 
protein, and adenosine deaminase (ADA) . 

The differentiated cell may comprise a cell 
of any lineage including a cell of epithelial, 

30 hemopoetic or neural origin. For example, the 
differentiated cell may be a mature differentiated 
keratinocyte . 
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Targeting expression of a protein to 
progeny of a stem cell but not to the stem cell itself 
The synthetic nucleic acid sequence 
5 produced above may be transfected directly into the 
differentiated cell for the desired function or 
alternatively, transfected into the precursor cell. 
For example, in the case of ADA deficiency, expression 
of ADA in stem cells may result in loss of stem 

10 phenotype which is undesirable- However, an 

advantageous therapy may reside in transducing 
autologous marrow stem cells with a synthetic nucleic 
acid sequence operably linked to one or more 
regulatory sequences, wherein existing codons of the 

15 wild type ADA gene have been replaced with synonymous 
codons each corresponding to an iso-tRNA expressed in 
relatively high abundance in differentiated 
lymphocytes compared to the marrow stem cells. The 
transduced stem cells may then be reinfused into the 

20 patient. This approach will result in transduced 
marrow stem cells which are not capable of expressing 
ADA themselves, but which are able to give rise to a 
renewable population of differentiated lymphocytes 
which are capable of expressing ADA at levels 

25 sufficient to permit a therapeutic effect. In this 
regard, a suitable cell source for this purpose may 
comprise stem cells isolated as CD34 positive cells 
from a patient's peripheral blood or marrow. For gene 
delivery, a suitable vector may include a retrovirus 

30 or Adeno associated virus. 

Alternatively, in the case of inducing cell 
mediated immunity, dendritic cells are important 
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antigen presenting cells (APC) but have a very limited 
life span for antigen presentation once activated of 
between 14 to 21 days. Consequently, dendritic cells 
provide relatively short-term immune stimulation that 
5 may not be optimal. However, in accordance with the 
present invention, a long-term immune stimulation may 
be provided by transducing autologous bone marrow- 
derived CD34 positive dendritic cell precursors with a 
synthetic nucleotide sequence encoding an antigen. 

10 such as the melanoma antigen MART-1, wherein the 
synthetic sequence is operably linked to one or more 
regulatory sequences, and wherein existing codons of a 
wild type nucleotide sequence encoding MART-1 have 
been replaced with synonymous codons each 

15 corresponding to an iso-tRNA expressed in relatively 
high abundance in dendritic cells compared to the 
dendritic cell precursors. The transduced dendritic 
cell precursors may then be reinfused into the 
patient. This approach will result in transduced 

20 dendritic cell precursors which are not capable of 
expressing MART-1 themselves, but which are able to 
give rise to a renewable population of dendritic cells 
which are capable of expressing MART-1 at levels 
sufficient to permit a lifelong intermittent 

25 restimulation of a cytotoxic T lymphocyte (CTL) 
response to the MART-1 antigen . 

Targeting expression of a protein to a stem 
cell hut not to progeny of the stem cell 
30 In an alternate embodiment, the target cell 

may be an undifferentiated cell wherein the protein is 
not expressible in said undifferentiated cell, from a 
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parent nucleic acid sequence encoding the protein, at 
a level sufficient to effect a particular function 
associated with the protein. In such a case, at least 
one existing codon of the parent nucleic acid sequence 
5 is replaced with a synonymous codon corresponding to 
an iso-tRNA which, when compared to the iso-tRNA 
corresponding to the at least one existing codon, is 
in relatively higher abundance in the undifferentiated 
cell compared to a differentiated cell. This results 

10 in a synthetic nucleic acid sequence having altered 
translational kinetics compared to said parent nucleic 
acid sequence wherein the protein is expressible in 
the undifferentiated cell at a level sufficient to 
effect a particular function associated with the 

15 protein, but wherein the protein is not expressible in 
differentiated cells derived from the undifferentiated 
cell at a level sufficient to effect said function. 

This alternate embodiment may, by way of 
example, be used to permit expression of a 

20 transcriptional regulatory protein which when 
expressed in a particular undifferentiated cell or 
stem cell facilitates differentiation of the stem cell 
along a particular cell lineage. It will be 

appreciated that in such a case, the regulatory 

25 protein is normally expressed from a gene in which the 
existing codons correspond to iso-tRNAs which are in 
relatively low abundance in the stem cell compared to 
other iso-tRNAs and that therefore the protein is not 
capable of being expressed at levels sufficient for 

30 commitment of the stem cell to differentiate along a 
particular cell lineage. It will also be apparent 
that such commitment to differentiate along a 
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particular cell lineage may be utilized to prevent 
production of a particular lineage of cells such as 
cancer cells. 

Alternatively, the method according to this 
5 embodiment may be used to express a transcriptional 
regulatory protein that is involved in the production 
of a therapeutic agent or agents. Such a protein may 
include, for example, NF-kappa-B transcription factor 
p65 subunit (NF-kappa-B p65) which is involved in the 

10 production of interleukin-2 (IL-2) , interleukin-3 (IL- 
3) and granulocyte and macrophage colony stimulating 
factor (GMCSF) • NF-kappa-B p65 is encoded naturally by 
a nucleotide sequence comprising a number of existing 
codons each corresponding to an iso-tRNA expressed in 

15 relatively low abundance in stem cells. Accordingly, 
such sequence may be used as the parent nucleic acid 
sequence according to this embodiment. A suitable 
nucleotide sequence encoding this protein is 
described, for example, in Lyle et al (1994, Gene 138 

20 265-266) and in the EMBL database under Accession No 
HSNFKB65A which are hereby incorporated by reference. 

A suitable undifferentiated cell which may 
be utilized in accordance with the present embodiment 
includes but is not limited to a stem cell, such as a 

25 CD34 positive hemopoetic stem cell. 

The present embodiment may also be used 
advantageously for gene therapy where ongoing 
regulated expression of a transgene is desirable. For 
example, secure but reversible regulation of fertility 

30 is desirable in veterinary practice and in humans. 

Such regulation may be effected by transducing 
autologous breast ductal epithelial cells with a 
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synthetic nucleic acid encoding a leutinising hormone 
(LH) antagonist or a leutinising hormone releasing 
hormone (LHRH) antagonist under the control of one or 
more regulatory sequences. The synthetic nucleic acid 
5 may be produced by replacing existing codons of a 
parent nucleic acid with synonymous codons 
corresponding to iso-tRNAs expressed in relatively 
high abundance in resting breast ductal epithelial 
cells compared to differentiated cells arising 

10 therefrom* Once the transduced cells are implanted 
back into the patient, expression may be switched off 
by oral administration of progestagen, forcing the 
differentiation of the majority of the stem cells and 
loss of expression of the antagonist. Once pregnancy 

15 is established, the suppression would be self 
sustaining by the naturally produced progestagen. The 
iso-tRNA composition of resting and oestrogen drived 
breast epithelial cells may be established by first 
obtaining resting cells from reduction mammoplasty, 

20 and determining the cellular tRNA composition in the 
presence and absence of oestrogen. The synthetic 
nucleic acid sequence may be introduced into 
autologous resting epithelial cells by cell 
electroporation ex vivo, and the transduced cells may 

25 be subsequently transplanted subcutaneously into the 
patient. Progestagen may be administered as required 
to reverse regulation of fertility. 

Targeting expression of a toxin to a tuinor 
30 cell but not to any other cells of the mammal 

Many toxins and drugs are available that 
can kill tumor cells. However, these toxins and drugs 
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are generally toxic for all dividing cells. This 
problem may be nevertheless ameliorated by 
establishing the isoacceptor tRNA composition in a 
tumor clone, and constructing a synthetic toxin gene 
5 {e.g., ricin gene) or a synthetic anti -proliferation 
gene {e.g., the tumor supressor p53) using synonymous 
codons corresponding to iso-tRNAs expressed at 
relatively high abundance in the tumor clone compared 
to normal dividing cells of the mammal. The synthetic 
10 gene is then introduced into the patient by suitable 
means to selectively express the synthetic genes in 
tumor cells. 

Alternatively, a chemotherapy enhancing 
product gene {i.e., a drug resistance gene e.g., the 
15 multi-drug resistance gene) using a codon pattern 
unlikely to be expressed in the tumor efficiently may 
be employed. 



Targeting gene therapy to control body fat 
20 Leptins are proteins known to control 

satiety. By analogy with animal data, however, if too 
much leptin is administered to a patient, leptin- 
induced starvation might occur. Advantageously, a 
synthetic gene encoding leptin may be constructed 
25 including synonymous codons corresponding to iso-tRNAs 
expressed at relatively high levels in activated 
adipocytes compared to non-activated adipocytes. The 
synthetic gene may then be introduced into the patient 
by suitable means such that leptin is only expressed 
3 0 substantially in activated adipocytes as opposed to 
non-activated adipocytes. As body fat turnover 
diminishes under the influence of leptin reduced 
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appetite, the metabolic activity of the adipocytes 
falls and the leptin production decreases 
correspondingly . 



5 Targeting expression of a protein to a 

stage of the cell cycle 

In another embodiment of the invention, the 
target cell may be a non-cycling cell. In this case, 
the protein which is desired to be selectively 

10 expressed in the non-cycling cell is expressible in a 
cycling cell of the mammal from a parent nucleic acid 
sequence at a level sufficient to effect a particular 
function associated with the protein. The synonymous 
codons are selected such that each corresponds to an 

15 iso-tRNA which, when compared to the iso-tRNA 
corresponding to the at least one existing codon, is 
in higher abundance in the non- cycling cell compared 
to the cycling cell. Accordingly, a synthetic nucleic 
acid sequence is produced having altered translational 

20 kinetics compared to the parent nucleic acid sequence 
wherein the protein is expressible in the non-cycling 
cell at a level sufficient to effect a particular 
function associated with said protein, but wherein the 
protein is not expressible in the non-cycling cell to 

25 effect said function. 

The term ''non-cycling cell" as used herein 
refers to a cell that has withdrawn from the cell 
cycle and has entered the GO state. In this state, it 
is well known that transcription of endogenous genes 

30 and protein translation are at substantially reduced 
levels compared to phases of the cell cycle, namely 
Gl, S, G2 and M. 
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By ''cycling cell" is meant a cell which is 
in one of the above phases of the cell cycle. 

Expressing a protein in a target cell or 
5 tissue by in vivo expression of iso-tRNAs in the 
target cell or tissue 

In another aspect, the invention extends to 
a method wherein a protein may be selectively 
expressed in a target cell by introducing into the 

10 cell an auxiliary nucleic acid sequence capable of 
expressing therein one or more isoaccepting transfer 
RNAs which are not expressed in relatively high 
abundance in the cell but which are rate limiting for 
expression of the protein from a parent nucleic acid 

15 sequence to a level sufficient for effecting a 
function associated with the protein. In this 
embodiment, introduction of the auxiliary nucleic acid 
sequence in the cell changes the translational 
kinetics of the parent nucleic acid sequence such that 

20 said protein is expressed at a level sufficient to 
effect a function associated with the protein. 

The step of introducing the auxiliary 
nucleic acid sequence into the target cell or a tissue 
comprising a plurality of these cells may be effected 

25 by any suitable means. For example, analogous 
methodologies for introduction of the synthetic 
nucleic acid sequence referred to above may be 
employed for delivery of the auxiliary nucleic acid 
sequence into said cycling cell. 

30 



SUBSTITUTE SHEET (RULE 26) 



wo 99/02694 3 9 PCT/AU98/00530 

Assembly of virus particles in cells which 
do not normally permit assembly of virus particles 

In yet another aspect, the invention 
extends to a method for producing a virus particle in 
5 a cycling eukaryotic cell. The virus particle will 
comprise at least one protein necessary for virus 
assembly, wherein the at least one protein is not 
expressed in the cell from a parent nucleic acid 
sequence at a level sufficient to permit virus 

10 assembly therein. This method is characterized by 
replacing at least one existing codon of the parent 
nucleic acid sequence with a synonymous codon to 
produce a synthetic nucleic acid sequence having 
altered translational kinetics compared to the parent 

15 nucleic acid sequence such that the at least one 
protein is expressible from the synthetic nucleic acid 
sequence in the cell at a level sufficient to permit 
virus assembly therein. The synthetic nucleic acid 
sequence so produced is operably linked to one or more 

20 regulatory nucleotide sequences and is then introduced 
into the cell or a precursor cell thereof . The at 
least one protein is expressed subsequently in the 
cell in the presence of other viral proteins required 
for assembly of the virus particle to thereby produce 

25 the virus particle. 

Advantageously, the synonymous codon 
corresponds to an iso-tRNA expressed at relatively 
high level in the cell compared to the iso-tRNAs 
corresponding to the existing codons. 

30 The cycling cell may be any cell in which 

the virus is capable of replication. Suitably, the 
cycling cell is a eukaryotic cell. Preferably, the 
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cycling cell for production of the virus particle is a 
eukaryotic cell line capable of being grown in vitro 
such as, for example, CV-1 cells, COS cells, yeast or 
spodoptera cells. 
5 Suitably, the at least one protein of the 

virus particle are viral capsid proteins. Preferably, 
the viral capsid proteins comprise LI and/or L2 
proteins of papillomavirus. 

The other viral proteins required for 

10 assembly of the virus particle in the cell may be 
expressed from another nucleic acid sequence (s) which 
suitably contain the rest of the viral genome. In the 
case of the at least one protein comprising LI and/or 
L2 of papillomavirus, said other nucleic acid 

15 sequence (s) preferably comprises the papillomavirus 
genome without the nucleotide sequences encoding LI 
and/ or L2 . 

In yet a further aspect of the invention, 
there is provided a method for producing a virus 

20 particle in a cycling cell, said virus particle 
comprising at least one protein necessary for assembly 
of said virus particle, wherein said at least one 
protein is not expressed in said cell from a parent 
nucleic acid sequence at a level sufficient to permit 

25 virus assembly therein, and wherein at least one 
existing codon of said parent nucleic acid sequence is 
rate limiting for the production said at least one 
protein to said level, said method including the step 
of introducing into said cell a nucleic acid sequence 

30 capable of expressing therein an isoaccepting transfer 
RNA specific for said at least one codon. 
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In yet a further aspect, the invention 
resides in virus particles resulting from the above 
methods . 

The invention further contemplates cells or 
5 tissues containing therein the synthetic nucleic acid 
sequences of the invention, or alternatively, cells or 
tissues produced from the methods of the invention. 

The invention is further described with 
reference to the following non-limiting examples. 

10 

EXAMPLE 1 

Expression of synthetic LI and L2 protein in 
undifferentiated cells. 

15 Materials and Methods 

Codon replacements in the bovine PV (BPV)- 
LI and L2 genes 

The DNA and amino acid sequences of the 
wild- type LI (SEQ ID N0S:l,2)and L2 genes (SEQ ID 

20 NOS:5,6) are shown respectively in Figures lA and IB. 

To determine whether the presence of rare codons in 
wild-type LI (SEQ ID N0:1) and L2 (SEQ ID N0:5) genes 
(Table 1) inhibited translation, we synthesized the LI 
(SEQ ID N0:3) and L2 (SEQ ID NO: 7) genes by using 

25 synonymous substitutions as shown. To construct the 
synthetic sequences, we synthesized 11 pairs of 
oligonucleotides for LI and 10 pairs of 
oligonucleotides for L2 . Each pair of 

oligonucleotides has restriction sites incorporated to 

30 facilitate subsequent cloning (Figures lA and IB) . 
The degenerate oligonucleotides were used to amplify 
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LI and L2 sequences by PGR using a plasmid with BPVl 
genome as the template. The amplified fragments were 
cut with appropriate enzymes and sequentially ligated 
to pUC18 vector, producing pUCHBLl and pUCHBL2 . The 
5 synthetic LI (SEQ ID NO:3) and L2 (SEQ ID NO: 7) 
sequences were sequenced and found to be error- free, 
and then sub-cloned into the mammalian expression 
vector pCDNA3 containing SV40 ori (Invitrogen) , giving 
expression plasmids pCDNA/HBLl and pCDNA/HBL2. To 
10 compare expression of LI and L2 with that of the 
original sequences, the wild type LI (SEQ ID N0:1) and 
L2 (SEQ ID NO: 5) genes were cloned into the pCDNA3 
vector, resulting in pCDNA/BPVLlwt and pCDNA/BPVL2wt . 



15 Iwmunofluorescence and Western blot 

staining 

For immunoblotting assays, Cos-1 cells in 
6 -well plates were transfected with 2 /xg LI or L2 
expression plasmids using lipof ectamine (Gibco) . 36 

20 hrs after transf ection, cells were washed with 0,15M 
phosphate buffered 0.9% NaCl (PBS) and lysed in SDS 
loading buffer. The cellular proteins were separated 
by 10% SDS PAGE and blotted onto nitrocellulose 
membrane. The LI or L2 proteins were identified by 

25 electrochemiluminescence (Amersham, UK) , using BPVl LI 
(DAKO) or L2 -specific (17) antisera. For 
immunof luorescent staining, Cos-1 cells were grown on 
8 -chamber slides, transfected with plasmids, and 
fixed and permeabilised with 85% ethanol 36hr after 

30 transf ection. The slides were blocked with 5% milk-PBS 
and probed with LI or L2- specific^ antisera, followed 
by FITC- conjugated anti -rabbit IgG (Sigma) . For GFP or 
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PGFP plasmid transfected cells, the cell were fixed 
with 4% buffered formaldehyde and viewed by epi- 
fluorescence microscopy. 

5 Northern blotting 

Cos cells transfected with various plasmids 
were used to extract cytoplasmic or total RNA using 
the QIAGEN RNeasy mini kit according to the 
supplier's handbook. Briefly, for cytoplasmic RNA 

10 purification, buffer RLN (50 mM Tris, pH 8.0, 140 mM 
NaCl, 1.5 mM MgCl2 and 0.5% NP40) was directly added to 
monolayer cells and cells were lysed in 4 °C for 5 min. 
After the nuclei were removed by centrifugation, 
cytoplasmic RNAs were purified by column. For total 

15 RNA extraction, the monolayer cells were lysed using 
buffer RLT supplied by the kit and RNA was purified by 
spin column. The purified RNAs were separated by 1.5% 
agarose gel in the presence of formaldehyde. The RNAs 
were then blotted onto nylon membrane and probed with 

20 (a) 1:1 mixed 5' -end labelled LI wt and HBLl 
fragments; (b) 1:1 mixed 5' -end labelled L2 wt and 
HBL2 fragments; (c) 1:1 mixed 5 'end labelled GPP and 
PGFP fragments or (d) randomly labelled PAGDH 
fragment . The blots were washed extensively at 65 

25 and exposed to X-ray films for three days. 



Results 

To test the hypothesis that the codon 
composition of the genes encoding the LI and L2 capsid 
30 proteins of papillomavirus (PV) contributes to their 
preferential expression in differentiated epithelial 
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cells, we produced synthetic BPVl LI (SEQ ID NO: 3) and 
L2 (SEQ ID NO: 7) genes, substituting codons 
preferentially used in mammalian genes for the codons 
frequently present in the wild type BPVl LI and L2 
5 sequences which are rare in eukaryotic genes (Figures 
lA, IB) . 

For the LI gene, a total of 202 base 
substitutions were made in 196 codons, without 
changing the encoded amino acid sequence (Figure lA) . 

10 This synthetic "humanized" BPV LI gene (SEQ ID NO: 3) 
was designated HBLl . In a similarly modified BPVl L2 
gene (SEQ ID NO: 7) designated HBL2 , 303 bases were 
changed to substitute 290 less frequently used codons 
with the corresponding preferentially used codons. 

15 Using the synthetic HBLl (SEQ ID N0:3) and HBL2 (SEQ 
ID NO: 7) genes, we constructed two eukaryotic 
expression plasmids based on pCDNA3, and designated 
pCDNA/HBLl and pCDNA/HBL2 . Similar expression 

plasmids , constructed with the wild type BPVl LI (SEQ 

20 ID N0:1) and BPVl L2 (SEQ ID NO: 5) genes, were 
designated pCDNA/BPVLlwt and pCDNA/BPVL2wt , 

respectively. In each of these plasmids the SV40 ori 
allowed replication in Cos-l cells, and the LI or L2 
gene was driven by a strong constitutive CMV promoter. 

25 To compare the expression of the synthetic 

humanized and the wild type BPVl LI or BPVl L2 genes, 
we separately transfected Cos-1 cells with each of the 
LI and L2 plasmids described above. Transfected cells 
were analyzed for expression of LI (SEQ ID NO: 2, 4) or 

30 L2 (SEQ ID NO:6,8) protein by immunofluorescence 36 hr 
after transf ection (Figures 2A and 3A) . Cells 
transfected with the pCDNA3 expression plasmid 
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containing the synthetic humanized LI (SEQ ID NO: 3) or 
L2 (SEQ ID NO: 7) genes were observed to produce large 
amounts of the corresponding protein, while cells 
transfected with expression plasmids with the wild 
5 type LI (SEQ ID N0:1) or L2 (SEQ ID NO: 5) sequences 
produced no detectable LI or L2 protein (Figures 2A 
and 3A, see nuclear staining of LI and L2 proteins) . 
To compare more accurately the expression of the 
different LI and L2 constructs, LI and L2 protein 

10 expression was assessed by immunoblot in Cos-1 cells 
transfected with the wild type or synthetic humanized 
BPVl LI or L2 pCDNAS expression constructs (Figures 2B 
and 3B) . Large amounts of immunore active LI and L2 
proteins were expressed from the synthetic humanized 

15 LI (SEQ ID NO: 3) and L2 (SEQ ID NO: 7) sequences, but 
no LI or L2 protein was expressed from the wild type 
LI and L2 sequences (SEQ ID NO: 1,5). 

To establish whether the alterations to the 
primary sequence of the LI and L2 mRNA which resulted 

20 from the codon alterations also affected steady state 
expression of the corresponding message, mRNA was 
prepared from Cos-1 cells transfected with the various 
capsid protein gene constructs. Using GAPDH as an 
internal standard it was established by Northern blot 

25 that two to three times more modified than wild type 
LI mRNA, and similar levels of wild type and modified 
L2 mRNA were present in the cytoplasm of transfected 
cells (Figures 2C and 3C) . The amount of LI or L2 
protein expressed per arbitrary unit of LI or L2 mRNA 

30 was at least 100 fold higher for the humanized gene 
constructs than for the natural gene constructs . 
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Papillomavirus late protein translation in vitro 

5 Materials and Methods 

In vitro translation assay 

One microgram of each plasmid was 
incubated with 20 /xCi ^^S -methionine (Amersham) and 40 
fih T7 coupled rabbit reticulocyte or wheat germ 
10 lysates (Promega) . Translation was performed at 30 °C 
and stopped by adding SDS loading buffer. The LI 
proteins were separated by 10% SDS PAGE and examined 
by autoradiography. 



15 Production of aminoacyl - tRNA 

2.5 X 10"^ M tRNA (Boehringer) was added to 
a 20 reaction containing 10 mM Tris-acetate, 

pH.7.8, 44 mM KCl, 12 mM MgClj, 9 mM -mercaptoethanol , 
38 mM ATP, 0.25 mM GTP and 7 /xL rabbit reticulocyte 

20 extract. The reaction was carried out at 25 "^C for 20 
min, and 3 0 /iL HjO was added to the reaction to dilute 
the tRNAs to 1 x 10""* M. The aminoacyl-tRNAs were then 
aliquoted and stored at -70 °C. 



25 RgSUltiS 

As the major limitation to expression of 
the wild type BPV LI and L2 genes appeared to be 
translational in our system we wished to test whether 
this limitation reflected a limited availability of 
30 the appropriate tRNA species for gene translation. As 
transient expression of the synthetic genes within 
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intact cells may be regulated by many factors, we 
tested our hypothesis in a cell free system using 
rabbit reticulocyte lysate (RRL) or wheat germ lysate 
to examine gene translation. Similar amounts of 
5 plasmids expressing the wild type or synthetic 
humanized BPVl LI gene were added to a T7-DNA 
polymerase coupled RRL transcription/translation 
system in the presence of ^^S-methionine . After 20 
minutes, translated proteins were separated by SDS 

10 PAGE and visualized by autoradiography. Efficient 
translation of the modified LI gene was observed 
(Figure 4, top panel, lane 2), while translation of 
the wild type BPVl LI sequence resulted in a weak 55 
kDa LI band (Figure 4, upper panel, lane 1) . We 

15 reasoned that although the wild type sequence was not 
optimized for translation in RRL, some translation 
would occur as there would be no cellular mRNA species 
competing for the 'rare' codons present in the wild 
type LI sequence. The above data suggest that the 

20 observed difference in efficiency of translation of 
the wild type and synthetic humanized LI genes is a 
consequence of limited availability of the tRNAs 
required for translation of the rare codons present in 
the wild type gene. We therefore expected that 

25 addition of excess tRNA to the in vitro translation 
system would overcome the inhibition of translation of 
the wild type LI gene. To address this question, 10"^ 
M aminoacyl- tRNAs from yeast were added into the RRL 
translation system, and LI protein synthesis was 

30 assessed. Introduction of exogenous tRNAs resulted in 
a dramatic improvement in translation of the wild type 
LI sequence, which now gave a yield of LI protein 
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comparable to that observed with the synthetic 
humanized LI sequence (SEQ ID N0:3) (Figure 4, top 
panel) . Enhancement of translation of the wild type 
LI gene (SEQ ID N0:1) by aminoacyl-tRNA was dose- 
5 dependent, with an optimum efficiency at 10"^ M tRNA. 
As addition of exogenous tRNA improved the yield of LI 
protein translated from the wild type LI gene sequence 
(SEQ ID NO:l), we assessed the speed of translation of 
wild type and humanized LI mRNA. Samples were 

10 collected from the translation mixture every 2 
minutes, starting at the 8th minute. Translation of LI 
(SEQ ID NO: 2, 4) from the wild type sequence (SEQ ID 
NO:l) was much slower than from the humanized LI 
sequence (SEQ ID NO: 3) (Figure 4 bottom panel), and 

15 the retardation of translation could be completely 
overcome by adding exogenous tRNA from commercially 
available yeast tRNA. Yeast tRNA was chosen in the 
above analysis because the codon usage in yeast is 
similar to that of papillomavirus (Table 1) . Addition 

20 of exogenous tRNA did not significantly improve the 
translation of the humanized LI gene (SEQ ID N0:3), 
indicating that this sequence was optimized with 
regard to codon usage for the rabbit reticulocyte 
translation machinery (Figure 4, bottom panel) . In 

25 separate experiments we established that wt LI 
translation could also be enhanced by liver. tRNA 
(Figure 4), and by tRNAs extracted from bovine skin 
epidermis, which presumably constitutes a mixture of 
tRNAs from differentiated and undifferentiated cells 

30 (data not shown) . 
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Translation of wild type LI is efficient in wheat germ 
5 extract . 

To further test our hypothesis that tRNA 
availability is a determinant of expression of the 
wild type BPVl LI gene (SEQ ID NO:l), we examined the 
translation of LI in a cell type in which a quite 

10 different set of tRNAs would be available. In a wheat 
germ translation system, wild type LI mRNA was 
translated as efficiently as humanized LI mRNA, and 
addition of exogenous aminoacyl- tRNAs did not improve 
the translation efficiency of either wild type or 

15 humanized sequences (Figure 4 bottom panel) . This 
indicated that in wheat germ there are sufficient of 
the tRNAs which are limiting for translation of wild 
type LI sequence in RRL to allow efficient LI 
translation. 

20 

EXAMPLE 4 

Modified late genes can be expressed in 
undifferentiated cells from papillomavirus promoter (s) 

25 While our data presented above indicates 

that translation is limiting for the production of 
BPVl capsid proteins in our test system, these 
experiments were conducted in systems which are not 
truly representative of the viral late gene 

30 transcription from the BPV genome, in part because the 
genes were driven by a strong CMV promoter. We 
therefore wished to establish whether synthetic 
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humanized BPV capsid protein mRNA would be translated 
more efficiently than the wild type mRNA, if 
transcribed from the natural BPVl promoter. This 
would establish whether translation was indeed one of 
5 the limiting factors for expression of BPVl late genes 
driven from the natural cryptic late gene promoter in 
an undifferentiated cell. The BPV genome was cleaved 
at nt 4450 and 6958 with BainHI/Hindlll and the 
original LI (nt 4186-5595) and L2 (5068-7095) ORFs 

10 were removed. The synthetic humanized L2 gene (SEQ ID 
NO: 7) , together with an SV40 ori sequence to allow 
plasmid replication in eukaryotic cells, were inserted 
into the BPV genome lacking L1/L2 ORF sequences. This 
plasmid (Figure 5A) was designated pCICRl. A similar 

15 plasmid was constructed with wild type (SEQ ID NO: 5) 
rather than synthetic humanized L2 and designated 
pCICR2 . Cos-1 cells were transfected with these 
plasmids and L2 protein expression examined by 
immunofluorescence of transfected cells. Synthetic 

20 humanized L2 (SEQ ID N0:7) , driven by the natural BPV- 
1 promoter, was efficiently expressed, whereas the 
wild type L2 sequence (SEQ ID N0:5), driven from a 
similar construct, produced no immunoreactive L2 
protein (SEQ ID NO: 6, 8) (Figure 5B) . As 

25 undifferentiated cells supported the expression of the 
humanized L2 gene (SEQ ID NO: 7) but not the wild type 
L2 (SEQ ID NO: 5) expressed from the cryptic late BPV 
promoter, the results confirmed our earlier 
observations from experiments using the CMV promoter. 

30 However, the plasmids tested here contained SV40 ori, 
designed to replicate the DNA in Cos cells. The 
increased copy number of the BPVl L2 plasmids or the 
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transcriptional enhancing activity of the SV40 ori 
might explain in part the increased efficiency of 
expression of L2 in this experimental system when 
compared with infected skin* However, the marked 
5 difference in expression between the natural and 
humanized genes seen with a CMV promoter construct is 
still observed with the natural promoter. 



EXAMPLE 5 

10 

Substitution of papillomavirus -preferred codons 
prevents translation but not transcription of a non- 
pap illomavirus gene in undifferentiated cells. 

Materials and Methods 

15 Codon replacement in gfp gene 

To construct a modified gfp gene (SEQ ID 
NO: 11) using papillomavirus preferred codons (PGFP) , 6 
pairs of oligonucleotides were synthesized. Each pair 
of oligonucleotides has restriction sites incorporated 

20 and was used to amplify gfp using a humanized gfp gene 
(SEQ ID NO: 9) (GIBCO) as template. The PGR fragments 
were ligated into the pUClS vector to produce pUCPGFP. 
The PGFP gene was sequenced^ and cloned into BairiHI 
site of the same mammalian expression vector, pCDNA3, 

25 under the CMV promoter. The DNA and deduced amino 
acid sequences of the humanized GFP gene are shown in 
Figures IC. Mutations introduced into the wild type 
gfp gene (SEQ ID NO: 9) to produce the Pgfp gene (SEQ 
ID NO: 11) are indicated above the corresponding 

3 0 nucleotides of the wild- type sequence. 
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To further confirm that codon usage can 
alter gene expression in mammalian cells, we made a 
further variant on a synthetic gfp gene modified for 
5 optimal expression in eukaryotic cells (Zolotukhin, et 
al., 1996. J". Virol. 70:4646-4654). In our variant, 
codons optimized for expression in eukaryotic cells 
were substituted by those preferentially used in 
papillomavirus late genes. Of 240 codons in the 

10 humanized gfp gene (SEQ ID NO: 9), which expresses high 
levels of fluorescent protein in cultured cells, 156 
were changed to the corresponding papillomavirus late 
gene-preferred codons to produce a new gfp gene (SEQ 
ID NO: 11) designated Pgfp. Expression of Pgfp (SEQ ID 

15 NO: 11) in undifferentiated cells was compared with 
that of humanized gfp (SEQ ID NO: 9). Cos-1 cells 
transfected with the humanized gfp (SEQ ID NO: 9) 
produced a bright fluorescent signal after 24 hrs, 
while cells transfected with Pgfp (SEQ ID NO: 11) 

20 produced only a faint fluorescent signal (Figure 6A) . 
To confirm that this difference reflected differing 
translational efficacy, gfp specific mRNA was tested 
in both transf ections and found not to be 
significantly different (Figure 6B.). Thus, codon 

25 usage and corresponding tRNA availability apparently 
determines the observed restriction of expression of 
PV late genes, and modification of codon usage in 
other genes similarly prevents their expression in 
undifferentiated cells. 
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PXAMPJl^E 6 

PGFP with papillomavirus -preferred codons is 
5 efficiently expressed in vivo in differentiated mouse 

keratinocytes . 

Mftteyj-ftls ^n<A Methods 

Delivery of plasmid DNA into mouse skin hy 

gene gun 

10 Fifty microgram of DNA was coated onto 25 

|4.g gold micro-carriers by calcium precipitation, 
following the manufacturer's instructions (Bio-Rad) . 
C57/bl mouse skin was bombarded with gold particles 
coated with DNA plasmid at a pressure of 600 psi. 

15 Serial sections were taken from the skin and examined 
for distribution of the particles, confirming that a 
pressure of 600 psi could deliver particles throughout 
the epidermis. 

Results 

20 Mice were shot with gold beads carrying 

PGFP DNA plasmid and, 24 hrs later, skin samples were 
cut from the site of DNA delivery and examined for 
expression of GFP protein (SEQ ID NO: 10, 12). 
Fluorescence was detected mostly in upper keratinocyte 

25 layers, representing the differentiated epithelium, 
and was not seen in undifferentiated basal cells. In 
contrast, skin sections shot with the humanized GFP 
plasmid showed fluorescence in cells randomly 
distributed throughout the whole epidermis (Figure 7) . 

30 Although GFP-positive cells were rare in both PGFP- 
(SEQ ID NO: 11) and GFP- inoculated (SEQ ID NO: 9) mouse 
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skin, fluorescence was observed only in differentiated 
strata in the PGFP sample {SEQ ID NO: 11), whereas 
fluorescence was observed throughout the epidexrois in 
GFP- inoculated (SEQ ID NO: 9) mouse skin. This result 
5 confirmed that the use of papillomavirus-pref erred 
codons resulted in the protein being expressed in an 
epithelial differentiation- dependent manner. 

BMMPLS 7 

10 

Microinjection of yeast tRNA and wild type 
LI gene into cultured cells 

To test if yeast tRNA could facilitate 
expression of wild type BPV-1 LI (SEQ ID N0:1) (as 

15 yeast uses a similar set of codons to those observed 
in papillomavirus for its own genes) , 2 pL of mixtures 
containing tRNA (2 mg/mL) (purified yeast tRNA 
(Boehringer Mannheim) or bovine liver tRNA control) 
and BPV LI DNA (2 fig/mL) can be injected into CV-1 

20 cells (Lu and Campisi, 1992, Proc. Natl. Acad. 

Sci. U. S. A. 89 3889-3893). The injected cells can 
then be cultured for 48 hrs at 37 and examined for 
e3q)ression of LI gene by standard immunof luoresence 
methods using BPV LI -specific antibody and quantified 

25 by FACS analysis (Qi et al 1996, Virology 216 35-45) . 

EXAMPLE 8 

Bstajblish/nent of a cell line which can 
30 continuously produce HPV virus particles 

To produce infectious PV, various methods 
have been tried including the epithelial raft culture 
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system (Bollard et al 1992, Genes Dev 6 1131-1142), 
and cell lines containing BPV-1 episomal DNA, and 
infected by BPV-1 L1/L2 recombinant vaccinia (Zhou et 
al 1993, J. Gen. Virol. 74 763-768) or transfected by 
5 SFV RNA (Roden et al 1996, J. Virol. 70 5875-5883). 
The yield of particles is in each case low. In a 
reduction to practice of our discovery, synthetic BPV 
LI (SEQ ID NO: 3) and L2 genes (SEQ ID NO: 7) (as 
described in Example 1) can be used to produce 

10 infectious BPV in a cell line containing BPV-1 
episomal DNA. Fibroblast cell lines (CON/BPV) 

containing BPV-1 episomal DNA (Zhou et ai 1993, J. 
Gen. Virol. 74 763-768) can be used for transfection 
of the synthetic BPV-1 LI (SEQ ID NO: 3) and L2 genes 

15 (SEQ ID NO: 7) under control of CMV promoter. BPV 
particles may then be purified from the cell lysate 
and the purified particles examined for the presence 
of BPV-1 genome. Standard methods such as 

transfection with lipof ectamine (BRL) and G418 

20 selection of transfected cells can be utilized to 
generate suitable transf ectants expressing humanized 
LI (SEQ ID NO: 3) and L2 (SEQ ID NO: 7) in the 
background of BPV-1 episomal DNA. Examination of LI 
and L2 protein expression can be performed using 

25 rabbit ant i -BPV LI or rabbit anti-BPV L2 polyclonal 
antibodies. BPV particles can then be purified using 
our published methods (Zhou et al 1995, Virology 214 
167-176) and can be characterized by electron 
microscopy and DNA blotting. The infectivity of BPV 

30 particles isolated from the cultured cells may be 
tested in focus formation assays using C127 
fibroblasts . 
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Method for extracting and measuring tRNA from tissues 
Tissue (lOOg) is homogenized in a Waring 
5 Blender with 150 mL of phenol (Mallinckrodt , 
Analytical Reagent, 88%) saturated with water (15:3) 
and 150 mL of 1.0 M NaCl, 0.005 M EDTA in 0.1 M Tris- 
chloride buffer, pH 7.5. The homogenate was spun 
for ten minutes at top speed in the International 

10 clinical centrifuge and the upper layer was carefully 
decanted off. To this aqueous layer, three volumes of 
95% ethanol were added. The resultant precipitate was 
spun down at top speed in the International clinical 
centrifuge and resuspended in 250 mL of 0.1 M 

15 Tris/chloride buffer, pH 7.5. This solution was added 
(flow rate of 15-20 drops per minute) to a column (2 x 
10 cm) of 2 g of DET^-cellulose previously 
equilibrated with cold 0.1 M Tris-chloride buffer pH 
7.5. The column was then washed with 1 L of Tris- 

20 chloride buffer, pH 7.5 and the RNA eluted with 1.0 M 
NaCl in 0.1 M Tris-chloride buffer, pH 7.5. The first 
10 mL of NaCl solution were discarded as "hold-up." 
Sufficient salt solution (60-80 mL) was then collected 
until the optical density of the effluent was less 

25 than three at 260 nm. This solution was extracted 
twice with an equal volume of phenol saturated with 
water and twice with ether. To the aqueous solution 
containing the RNA, three volumes of 95% ethanol were 
added and the solution wag allowed to stand overnight 

30 in the cold. The precipitate was spun down and washed 
first with 8 0% and then twice with 95% ethanol and 
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dried in a vacuum. Approximately 60 mg of soluble 
RNA were obtained from a 100 -g lot of rat liver. 

Quantitating tRNAs 
5 The following nylon membranes are used: 

Biodine A and B (PALL) . For the preparation of dot 
blots, the tRNA samples (from 1 pg to 5 ng) are 
denatured at 60 ®C for 15 min in 1-5 /iL of 15% 
formaldehyde. lOx SSC (SSC is NaCl 0.3 M, tri-sodium 

10 citrate 0.03 M) . The samples are spotted in 1 /zL 
aliquot s onto the membranes that have been soaked for 
15 min in deionized water and slightly dried between 
two sheets of 3MM Whatman paper prior to the 
application of the samples. The tRNAs are fixed 

15 covalently (in the membranes by ultraviolet - 
irradiation (10 mm using an ultraviolet lamp at 254 
nm and 100 W strength at a distance of 2 0 cm) and the 
membranes are baked for 2-3 h at 80 *^C- 

A 5' end labelled synthetic deoxyribo- 

20 oligonucleotide complementary to the A54-A73 sequence 
of the tRNA is used as a probe for the hybridization 
experiments. Labelling of the oligonucleotide is 
performed by direct phosphorylation of the 5' OH' 
ended probe. 

25 For hybridisation experiments, the UV- 

irradiated membranes are first preincubated for 5 h at 
50 C in 50% deionized formamide, 5 x SSC, 1% SDS, 
0.04% Ficoll 0.04% polyvinylpyrrolidone and 250 /xL/mL 
of sonicated salmon sperm DNA using 5 mL of buffer for 

30 100 cm^ of membrane. Hybridization is finally 

performed overnight at 50 °C in the above solution 
(2.5 mL/100 cm^) where the labeled probe has been 
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added. After hybridization, the membranes are washed 
twice in 2 X SSC, 0.1% SDS for 5 min at room 
temperature, twice in 2 x SSC. 1% SDS for 30 mm at 60 
oc and finally in 0.1 x SSC. 0.1% SDS for 30 min at 
5 ■ room temperature. To detect the hybridized probes the 
membranes are exposed for 16 h to Fuji XR film at 70 -C 
with an intensifying screen. 



Sequence of tRNA probes 

The sequences of the tRNA probes are 

follows : 



15 



20 



25 



30 



Ala°°^: 


5' - 


TAAGGACTGTAAGACTT 


(SEQ ID NO: 13) 


Arg«^: 


5'- 


CGAGCCAGCCAGGAGTC 


(SEQ ID NO: 14) 


Asn*^=: 


5' - 


CTAGATTGGCAGGT^TT 


(SEQ ID NO: 15) 


Asp*^": 


5' - 


• TAAGATATATAGATTAT 


(SEQ ID NO: 16) 


Csy^^ 


5' - 


■AAGTCTTAGTAGAGATT 


(SEQ ID NO: 17) 


Glu'^: 


5' - 


■ TATTTCTACACAGCATT 


(SEQ ID NO: 18) 


Gln°^: 


5' - 


- CTAGGACAATAGGAATT 


(SEQ ID NO: 19) 


Gly°°*: 


5' • 


- TACTCTCTTCTGGGTTT 


(SEQ ID NO: 20) 


His^*'^: 


5' 


- TGCCGTGACTCGGATTC 


(SEQ ID NO: 21) 


Ile'^'": 


5' 


- TAGAAATAAGAGGGCTT 


(SEQ ID NO: 22) 


Leu"'': 


5' 


- TACTTTTATTTGGATTT 


(SEQ ID NO: 23) 


Leu"^: 


5' 


- TATTAGGGAGAGGATTT 


(SEQ ID NO: 24) 


Lys"^: 


5' 


-TCACTATGGAGATTTTA 


(SEQ ID NO: 25) 


Lys'^: 


5' 


- CGCCCAACGTGGGGCTC 


(SEQ ID NO: 26) 


Met*'°"^ 


5' 


- TAGTACGGGAAGGATTT 


(SEQ ID NO: 27) 


Phe"^: 


5' 


- TGTTTATGGGATACAAT 


(SEQ ID NO: 28) 


Pro^"': 


5' 


- TCAAGAAGAAGGAGCTA 


(SEQ ID NO: 2 9) 


Pro«=^ 


5' 


-GGGCTCGTCCGGGATTT 


(SEQ ID N0:30) 


Ser*°=: 


5' 


' -ATAAGAAAGGAAGATCG 


(SEQ ID NO: 31) 




5 


' -TGTCTTGAGAAGAGAAG 


(SEQ ID NO: 32) 


Tyr'^*"^ : 


5 


' -TGGTAAAAAGAGGATTT 


(SEQ ID NO: 33) 
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Val°^^: 5' -TCAGAGTGTTCATTGGT (SEQ ID NO: 34) 

EXAMPLE 10 

5 Comparison .of the relative abundance of tRNA species 

in undifferentiated and differentiated keratinocytes 

Materials and Methods 

Isolation of epidermal cells 
2 -day old mice were killed and their skins 
10 removed • The skins were digested with 0.25% trypsin 
PBS at 4 ®C overnight. The epidermis was separated 
from the dermis using forceps and minced with scissors 
in 10% PCS DMEM medium. The cell suspension was first 
filtered through a 1 mm and then a 0.2 mm nylon net, 
15 The cell suspension was then pelleted and washed twice 
with PBS. 

Density gradient centrifugation 
The keratinocytes were resuspended in 3 0% 
20 Per coll and separated by centrifugation through a 
discontinuous Percoll gradient (1.085, 1.075 and 1.050 
g/mL) at 1200 x g at room temperature for 25 min. The 
cells were then washed with PBS and used to extract 
tRNA. 

25 

tRNA purification 

The cells were lysed in 5 mL of lysis 
buffer (0.2 M NaOH, 1% SDS) for 10 min at room 
temperature. The lysate was neutralized with 5 mL of 
30 3-0 M potassium acetate (pH 5.5). After 
centrifugation, the supernatant was diluted with 3 
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volumes of 100 mM Tris (pH 7.5) and added to a DEAE 
column equilibrated with 100 mM Tris (pH 7.5) . An 
equal volume of isopropanol was added to the aqueous 
solution containing tRNA, and the solution was allowed 
5 to stand overnight at 4 °C. The tRNA was spun down 
and washed with 75% ethanol, then dissolved in RNase- 
free water. 



tRNA blotting 

10 10 ng of each tRNA sample in 1 fxh was 

denatured in 60 °C for 15 min in 4 fih formaldehyde and 
5 /xL 20 X SSC. The samples were spotted in 1 /zL 

aliquots onto charged nylon membrane (Amersham) , and 

the tRNAs were fixed with UV and probed with "P- 

15 oligonucleotides . 

Results 

Comparison of the abundance of the tRNA 
species in undifferentiated and differentiated 
keratinocytes showed that the levels of some tRNA 
20 populations changed dramatically. For example, the 
levels of tRNAs specific for Ala°°^, Leu^*^, Leu^^ were 
increased in differentiated cells while tRNAs for 
Arg^, Pro^^S Asn**° were more abundant in 
undifferentiated keratinocytes (see Table 2) . 

25 

gSNBRA?^ PISCUSSIQN 

In the present specification the inventors 
have confirmed that one determinant of the efficiency 
3 0 of translation of a gene in mammalian cells is its 
codon composition. This observation has commonly been 
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made when genes from prokaryotic organisms have been 
expressed in eukaryotic cells (Smith, D. W., 1996, 
Biotechnol. Prog. 12:417-422). The present inventors 
have also presented evidence that mRNA encoding the 
5 capsid genes of papillomavirus are not effectively 
translated in cultured eukaryotic cells, apparently 
because tRNA availability is rate limiting for 
translation, and that the block to PV late gene 
translation in eukaryotic cells in culture can be 
10 overcome by altering the codon usage of the late genes • 
to match the consensus for mammalian genes, or 
alternatively by providing exogenous tRNAs. 
Alterations to mRNA secondary structure or protein 
binding (Sokolowski, et al., 1998, J. Virol. 72:1504- 
15 1515) as a consequence of the changes to the primary 
sequence of the PV capsid genes might contribute to 
the observed differences in efficiency of translation 
of the natural and modified PV capsid gene mRNAs in 
cultured cells. However, the enhancement of 

20 translation of the natural but not the modified mRNA 
that was observed after addition of tRNA in a 
mammalian in vitro translation system, which was not 
observed in a plant translation system, strengthens 
the argument that tRNA availability is rate limiting 
25 for translation of the natural gene in mammalian 
cells. A shortage of critical tRNAs could result in 
slowed elongation of the nascent peptide or premature 
termination of translation (Oba, et al., 1991, 
Biochimie 73:1109-1112). Slowed elongation appears to 
30 be the major consequence for the PV late gene. 
Analysis of codon usage in the PV genome shows that PV 
late genes use many codons that mammalian cells rarely 
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use. For example, PV frequently uses UUA for leucine, 
CGU for arginine, ACA for threonine, and AUA for 
isoleucine, whereas these codons are significantly 
less often used in mammalian genes. In contrast, 
5 papillomavirus late genes can be expressed efficiently 
in yeast (Jansen, et al., 1995, Vaccine 13:1509-1514) 
(Sasagawa, et al,, 1995, Virology 206:126-135) and the 
codon composition of yeast and papillomavirus genes 
are similar (Table 1) . An apparent exception is that 

10 PV LI genes can be efficiently expressed in insect 
cells (Kirnbauer, et al . , 1992, Proc. Natl. Acad. Sci . 
USA 89:12180-12184) using recombinant baculovirus, or 
in various undifferentiated mammalian cells using 
recombinant vaccinia (Zhou, et al., 1991, Virology 

15 185:251-257) . As infection with vaccinia or 

baculovirus down regulates cellular protein synthesis, 
the efficient expression of the LI capsid proteins 
under these circumstances may occur because less 
cellular mRNA is available in a virus infected cell to 

20 compete with the LI mRNA for the rarer tRNAs. 

Codon composition could be a more general 
determinant of gene expression within different stages 
of differentiation of the same tissue. Although the 
genetic code is essentially universal, different 

25 organisms show differences in codon composition of 
their genes, while the codon composition of genes 
tends to be relatively similar for all genes within 
each organism, and matched to the population of iso- 
tRNAs for that organism (Ikemura, T., 1981, J. Mol. 

30 Biol, 146:1-21). However, populations of tRNAs in 
differentiating and neoplastic cells are different 
(Kanduc, D. , 1997, Arch. Biochem. Biophys. 342:1-6; 
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Yang, and Comb, 1968, J. Mol . Biol. 31:138-142; Yang, 
and Novell! , 1968, Biochem. Biophys. Res. Cormxm. 31: 
534-539) and the tRNA populations also vary in cells 
growing under different growth conditions (Doi, et 
5 al., 1968, J. Biol. Chem. 243:945-951). Accordingly, 
the inventors believe that codon composition and tRNA 
availability together provide a primitive mechanism 
for spatial and/or temporal regulation of gene 
expression. It is well recognized that the G+C 

10 content of many dsDNA viruses, a crude marker for 
viral gene codon composition, is markedly different 
from the G+C content of the DNA of the cells they 
infect (Strauss, et al., 1995, "Virus Evolution" in 
Virology (eds. Fields, B. N., et al.) , Lipipincott- 

15 Raven, Philadelphia, pp 153-171) • Viruses may 

therefore have evolved to take advantage of codon 
composition to regulate their own program of gene 
expression, perhaps to avoid expression of lethal 
quantities of viral proteins in undifferentiated cells 

20 where the virus utilizes the cellular machinery to 
replicate its genome. 

As the inventors' observations represent an 
apparently novel mechanism of regulation of gene 
translation within a single tissue, it is relevant to 

25 consider how this relates to previously proposed 
hypotheses for the restriction of expression of PV 
late genes to differentiated epithelium. A number of 
explanations have been proposed for the observation 
that PV late genes are only effectively expressed in 

30 differentiated epithelium. Reduced late gene 

transcription may reflect dependence of transcription 
from the late promoter on transcription factors 
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expressed only in differentiated epithelium, or may 
alternatively be due to suppression of late promoter 
transcription by viral (Stubenrauch, et al., 1996, J, 
Virol. 70:119-126) or cellular gene products expressed 
5 in undifferentiated cells. The "late" promoters of 
HPV31b and of HPV5 (Haller, et al . , 1995, Virology 
214:245-255; Hummel, et al . , 1992, J. Virol. 66:6070- 
6080) are described as differentiation dependent, 
although the search for relevant transcription control 

10 factors in differentiated keratinocytes by 
conventional footprinting and DNA binding studies has 
to date been unrewarding. Our data show that capsid 
proteins are not translated from PV LI and L2 mRNAs in 
cells transfected with CMV promoter-based expression 

15 vectors (Fig. 2), suggesting that in addition to any 
transcriptional controls that may exist that there is 
a post-transcriptional block to capsid protein 
synthesis in undifferentiated cells. Sequences 
resembling 5 ' splice donor sites exist within LI or L2 

20 mRNA or within flanking untranslated message which are 
inhibitory to transcription of genes with which they 
are associated (Kennedy, et al . , 1991, J. Virol, 
65:2093-2097) (Furth, et al., 1994, Mol . Cell. Biol. 
14:5278-5289). Other AU rich sequences in LI or L2 

25 mRNA promote mRNA degradation (Sokolowski, et al., 
1997, Oncogene 15:2303-2319). These mechanisms 

inhibiting LI and L2 expression in undifferentiated 
cells have yet to be shown to be inactive in 
differentiated epithelium, to explain the successful 

30 translation of late genes in this tissue. 

Because inhibitory RNA sequences within the 
LI coding sequence could have been rendered non- 
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functional by the systematic codon substitution 
employed in the experiments described herein and the 
untranslated inhibitory sequences were not included in 
the inventors' test system, the respective roles of 
5 inhibitory sequences and codon mismatch in suppression 
of PV late gene expression in cultured mammalian cells 
cannot be determined. However, regulatory sequences 
promoting RNA degradation or inhibiting translation 
are presumed to act through interaction with nuclear 

10 or cytoplasmic proteins (Sokolowski, et al., 1998, J, 
Virol. 72:1504-1515), and inefficient translation of 
native sequence LI mRNA was observed in a cell free 
translation system from anucleate cells, demonstrating 
that codon composition of the PV late genes must play 

15 some role in regulation of PV late gene translation. 

Further evidence supporting the hypothesis 
that codon composition is an important determinant of 
PV capsid gene expression was gathered from an 
analysis of the 84 PV LI sequences currently available 

20 in Genebank. The codon composition of the LI genes, 
and particularly the frequency of usage of the rarer 
codons, was essentially the same across all the 
published sequences (data not shown) as would be 
predicted by the similar G+C content of the 

25 papillomavirus genomes. The PV LI gene is relatively 
conserved at the amino acid level, showing 60 - 80% 
amino acid homology between PV genotypes, as might be 
expected by the constraints on capsid protein 
function. There are, however, no obvious constraining 

30 influences on the codon composition of the PV late 
genes beyond those of the inventors' hypothesis, as 
the late gene region does not code for other genes, 
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either in other reading frames or on the complementary 
DNA strand, and has no known cis acting regulatory 
functions. If codon composition of the capsid genes 
were not important for PV function, a considerable 
5 heterogeneity of codon usage might therefore be 
expected, given the evolutionairy diversity of PVs 
(Chan, et al. 1995, J. Virol. 69:3074-3083). 

Taken together, the data and evidence 
outlined herein makes a strong case that codon usage 

10 is a significant determinant of expression of PV late 
genes in undifferentiated and differentiated 
epithelial cells, and that this observation is 
generalizable. The relative role of message 

instability and codon mismatch in determining 

15 expression in differentiated tissues will require 
comparisons of transcriptional activity and 
translation of the LI or L2 genes driven from strong 
constitutive promoters in differentiated and 
undifferentiated epithelium. Such work should now be 

20 feasible using either transgenic technology or 
keratinocyte raft cultures. 

Although mechanisms of transcriptional 
regulation of PV LI or L2 gene expression in the 
superficial layer of differentiated epithelium have 

25 been proposed (Zeltner et al., 1994, J. Virol. 
68:3620; Brown, et al., 1995, Virology 214:259; Stoler 
et al., 1992, Hum. Pathol. 23:117; Hummel et al., 
1995, J. Virol. 69:3381; Haller et al., 1995, Virology 
214:245; Barksdale and Baker, 1993, *J. Virol. 

30 67:5605), measurable PV late gene mRNA is not always 
associated with production of late proteins (Zeltner 
et al., 1994, supra; Ozbun and Meyers, 1997, J. Virol. 
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71:5161), and the data presented here suggest that 
translation regulation may play a major part in 
controlling PV late gene expression. This observation 
has implications as herein described for the 
5 regulation of expression of genes related to the 
specialised functions of any differentiated tissue, 
and also for targeting of expression of therapeutic 
genes to such tissue while avoiding the potentially 
deleterious consequences of expression of the 
10 exogenous gene in a self renewing stem cell 
population. 

The present invention has been described in 
terms of particular embodiments found or proposed by 
the present inventors to comprise preferred modes for 

15 the practice of the invention. Those of skill in the 
art will appreciate that, in light of the present 
disclosure, numerous modifications and changes may be 
made in the particular embodiments exemplified without 
departing from the scope of the invention. All such 

20 modifications are intended to be included within the 
scope of the appended claims. 
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TABLE 1 

The codon usage data for human, cow yeast 
and wheat proteins are derived from published 
5 results (18). The BPVl data are from the sequences in 
the Genbank database. 

TABLE 2 

Each iso-acceptor tRNA with anticodon shown 
10 as superscript are shown on top row. The 
indicates the abundance of tRNA wherein each 
indicates about 10 fold increase. 
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TABLE 1 

Frequency (per one thousand) of codon usage for individual 
organisms. 



Amino 


Codon s 


TTiiman 


WW w» 


Yeast: 


W AAV A W 


BPVT." 


acids 












L2 


ARG 


CGA 


5 . 4 


5 5 






7 9 




CGC 


11 3 


12 2 




7 R 






CGG 


10 . 4 


11 . 2 


1 1 


4 6 






CGU 


4 . 7 


3 - 7 


7 5 


1 1 






AGA 


9 . 9 


9 . 9 


24 0 


4 . 1 


14 4 

X ^ . ^ 




AGG 


11 . 1 


11 .4 


7 . 5 


7 . 1 




LEU 


CUA 


6.2 


4 . 9 


11 . 8 


12 . 1 


18 6 




cue 


19 . 9 


21 . 2 


4 1 


18 . 6 


6 2 




CUG 


42 . 5 


46 . 6 


8 . 3 


15 . 5 


15 5 




CUU 


10 . 7 


10 6 


9 6 


6 5 


20 7 




UUA 


5 . 3 


4 0 


24 R 


1 8 


14 R 

X*Z • -J 




UUG 


11 . 0 








X • ^ 


SER 


UCA 


9 . 3 


7 6 




14 6 


xo • u 




UCC 


17. 7 


17.6 


14.4 


10.1 


11.4 




UCG 


4.2 


4.5 


6.5 


9.6 


6.2 




UCU 


13.2 


11.2 


24.6 


14.8 


15.5 




AGC 


18.7 


18.7 


7.1 


12.8 


12 .4 




AGU 


9.4 


8.6 


11.7 


12.9 


21.7 


THR 


ACA 


14.4 


11.4 


15.6 


4.6 


37.3 




ACC 


23.0 


21.1 


13.9 


15.9 


19.7 




ACG 


6.7 


7.8 


6.7 


4.5 


4.1 




ACU 


12.7 


9.6 


22.0 


11.8 


28.0 
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Amino 
adds 

PRO 

ALA 

GLY 

VAL 

LYS 
ASN 
GLN 
HIS 
GLU 
ASP 



Codons 



CCA 
CCC 
CCG 

ecu 

CCA 
GCC 
GCG 
GCU 
GGA 
GGC 
GGG 
GGU 
GUA 
GUC 
GUG 
GUU 
AAA 
AAG 
AAC 
AAU 
CAA 
CAG 
CAC 
CAU 
GAA 
GAG 
GAC 
GAU 



Human 



14.6 

20.0 

6.5 

15.5 

14.0 

29.1 

7.2 

19.6 

17.1 

25.4 

17.3 

11.2 

5.9 

16.3 

30.9 

10.4 

22.2 

34.9 

22.6 

16.6 

11.1 

33.6 

14.2 

9.3 

26.8 

41.4 

29.0 

21.7 



70 
Cow 

12.0 

19.2 

7.9 

14 .6 

13.1 

35.8 

9.3 

19.1 

16.2 

28.1 

19.2 

11.8 

5.1 

18.4 

32.9 

9.9 

21.6 

37.1 

22 .4 

12.5 

9.7 

34.4 

14.0 

7.5 

24 .4 

45.4 

31.5 

19.2 



Yeast 



21.4 

5.9 

4.1 

12.8 

15.3 

15.5 

5.1 

28.3 

8.9 

8.9 

5.1 

34.9 

10.0 

14.9 

9.5 

26.6 

37.7 

35.2 

25.8 

31.4 

29.8 

10.4 

8.2 

12.3 

48. 9 

16.9 

22.3 

37.0 



Wheat 



71.2 

11.1 

19.4 

10 .3 

11.2 

19.5 

13.8 

9.6 

25. 9 

28.0 

28.5 

9.6 

4.4 

14.8 

12.9 

11.6 

4.5 

17.4 

14 .2 

6.7 

171.8 

79.4 

8.2 

7.1 

7.8 

19.7 

13.0 

4.0 
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BPVLl/ 
L2 

22.8 

15.5 

0.0 

33.1 

33.1 

17.6 

4.1 

13.5 

22.8 

12.4 

22.8 

18.6 

15.5 

6.2 

23 .8 

16.6 

37.2 

13.5 

10.3 

24.8 

22.8 

17.6 

6.2 

13.4 

36.2 

21.7 

18.6 

33 .1 
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Amino 


Codons 


Human 


Cow 


Yeast 


Wheat 


bpvl: 


acids 












L2 


TYR 


UAC 


18.8 


20.3 


16.5 


24.5 


17.6 




UAU 


12.5 


10.5 


16.5 


12.5 


18.6 


CYS 


UGC 


14.5 


13.9 


3.7 


14.8 


5.2 




UGU 


9.9 


9.4 


7.6 


4.9 


5.2 


PHE 


UUC 


22.6 


25.5 


20.0 


14.1 


7.2 




UUU 


15.8 


17.0 


23.2 


15.0 


23.8 


ILE 


AUA 


5.8 


5.2 


12.8 


5.4 


22.7 




AUG 


24 .3 


25.8 


18.4 


19.7 


8.2 




AUU 


14.9 


13.1 


31.1 


10.7 


20.7 
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TABLE 2 

tRNA population changes as KC starts to differentiate . 



tRNA 




Ala^^'^ 




Leu^ 


Leu"^ 


Lys^ 


Lys^ 




Pro^" 


Supra 


+ 


+++ 


+ 


+++ 


+++ 


++ 


+ 


+ 


+ 


Basal 


+++ 


+ 


++ 


+ 


+ 


+ 


+ 


++ 


+++ 






















tRNA 


Val^^ 


Val^' 


His°^ 


Asn*^^ 






Gly^ 






Supra 


++ 


+ 


++ 


+ 


+ 


+ 


+ 






Basal 


+ 


+ 


+ 


+++ 




++ 


+ 







SUBSTITUTE SHEET (RULE 26) 



wo 99/02694 

WHAT 15 TS; 



73 



PCT/AU98/00530 



1. A synthetic nucleic acid sequence capable 
of selectively expressing a protein in a target cell 
5 or tissue of a mammal, wherein said selective 
expression is effected by replacing at least one 
existing codon of a parent nucleic acid sequence with 
a synonymous codon to form said synthetic nucleic acid 
secjuence . 

10 2, The nucleic acid sequence of claim 1, 

wherein said synonymous codon corresponds to an iso- 
tRNA which, when compared to an iso-tRNA corresponding 
to the at least one existing codon, is in higher 
abundance in the target cell or tissue relative to one 

15 or more other cells or tissues of the mammal. 

3. The nucleic acid sequence of claim 1, 
wherein said synonymous codon corresponds to an iso- 
tRNA which, when compared to an iso-tRNA corresponding 
to the at least one existing codon, is in higher 

20 abundance in the target cell or tissue relative to a 
precursor cell or tissue. 

4. The nucleic acid sequence of claim 1, 
wherein said synonymous codon corresponds to an iso- 
tRNA which, when compared to an iso-tRNA corresponding 

25 to the at least one .existing codon, is in higher 
abundance in the target cell or tissue relative to a 
cell or tissue derived therefrom. 

5. The nucleic acid sequence of claim 1, 
wherein said synonymous codons for selective 

30 expression of said protein are selected from the group 
consisting of gca (Ala) , cuu (Leu) and cua (Leu) , and 
said target is a differentiated cell. 
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6. The nucleic acid sequence of claim 5, 
wherein said differentiated cell is a differentiated 
keratinocyte . 

7. The nucleic acid sequence of any one of 
5 claims 2 to 4, wherein said corresponding iso-tRNA in 

said target cell or tissue is at a level which is at 
least 110%, preferably at least 200%, more preferably 
at least 500%, and most preferably at least 1000%, of 
that expressed in the or each other cell or tissue of 
10 the mammal. 

8. The nucleic acid sequence of claim 1, 
wherein the synonymous codon may be selected from the 
group consisting of (1) a codon used at relatively 
high frequency by genes, preferably highly expressed 

15 genes, of the target cell or tissue, (2) a codon used 
at relatively high frequency by genes, preferably 
highly expressed genes, of the or each other cell or 
tissue, (3) a codon used at relatively high frequency 
by genes, preferably highly expressed genes, of the 

20 mammal, (4) a codon used at relatively low frequency 
by genes of the target cell or tissue, (5) a codon 
used at relatively low frequency by genes of the or 
each other cell or tissue, (6) a codon used at 
relatively low frequency by genes of the mammal, (7) a 

25 codon used at relatively high frequency by genes of 
another organism, and (8) a codon used at relatively 
low frequency by genes of another organism. 

9. The nucleic acid sequence of claim 1, 
wherein the at least one existing codon and the 

30 synonymous codon are selected such that said protein 
is expressed from said synthetic nucleic acid sequence 
in said target cell or tissue at a level which is at 
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least 110%, preferably at least 200%, more preferably 
at least 500%, and most preferably at least 1000%, of 
that expressed from said parent nucleic acid sequence 
in said target cell or tissue. 
5 10. A method for selectively expressing a 

protein in a target cell or tissue of a mammal, 
wherein said selective expression is effected by 
replacing at least one existing codon of a parent 
nucleic acid sequence with a synonymous codon to form 
10 said synthetic nucleic acid sequence. 

11. The method of claim 10, wherein said method 
is further characterized the steps of: 

(a) replacing at least one existing codon 
of a parent nucleic acid sequence encoding said 

15 protein with a synonymous codon to produce a synthetic 
nucleic acid sequence having altered translational 
kinetics compared to said parent nucleic acid sequence 
such that said protein is selectively expressible in 
said target cell or tissue; 

20 (b) administering to the mammal and 

introducing into said target cell or tissue, or a 
precursor cell or precursor tissue thereof, said 
synthetic nucleic acid sequence operably linked to one 
or more regulatory nucleotide sequences; and 

25 (c) selectively expressing said protein 

in said target cell or tissue. 

12. The method of claim 11 further including, 
prior to step (a) : 

(i) measuring relative abundance of 

30 different iso-tRNAs in said target cell or tissue, and 

in one or more other cells or tissues of the mammal; 
and 
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(ii) identifying said at least one 
existing codon and said synonymous codon based on said 
measurement, wherein said synonymous codon corresponds 
to an iso-tRNA which, when compared to an iso-tRNA 
5 corresponding to the existing codon, is in higher 
abundance in said target cell or tissue relative to 
the or each other cell or tissue of the mammal. 

13. The method of claim 12, wherein step (ii) 
is further characterized in that said synonymous codon 

10 corresponds to an iso-tRNA which, when compared to an 
iso-tRNA corresponding to the at least one existing 
codon, is in higher abundance in the target cell or 
tissue relative to a precursor cell or tissue. 

14. The method of claim 12, wherein step (ii) 
15 is further characterized in that said synonymous codon 

corresponds to an iso-tRNA which, when compared to an 
iso-tRNA corresponding to the at least one existing 
codon, is in higher abundance in the target cell or 
tissue relative to a cell or tissue derived therefrom. 

20 15. The method of claim 11 further including, 

prior to step (a) , identifying said at least one 
existing codon and said synonymous codon based on 
respective relative frequencies of particular codons 
used by genes selected from the group consisting of 

25 (I) genes of the target cell or tissue, (II) genes of 
the or each other cell or tissue, (III) genes of the 
mammal, and (IV) genes of another organism. 

16. A method for expressing a protein in a 
target cell or tissue from a first nucleic acid 

30 sequence including the steps of: 

introducing into said target cell or 
tissue, or a precursor cell or precursor tissue 
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thereof, a second nucleic acid sequence encoding at 
least one isoaccepting transfer RNA wherein said 
second nucleic acid sequence is operably linked to one 
or more regulatory nucleotide sequences, and wherein 
5 said at least one isoaccepting transfer RNA is 
normally in relatively low abundance in said target 
cell or tissue and corresponds to a codon of said 
first nucleic acid sequence, 

17. A method for producing a virus particle in 

10 a cycling eukaryotic cell, said virus particle 
comprising at least one protein necessary for assembly 
of said virus particle, wherein said at least one 
protein is not expressed in said cell from a parent 
nucleic acid sequence at a level sufficient to permit 

15 virus assembly therein, said method including the 
steps of: 

(a) replacing at least one existing codon 
of said parent nucleic acid sequence with a synonymous 
codon to produce a synthetic nucleic acid sequence 

20 having altered translational kinetics compared to said 
parent nucleic acid sequence such that said at least 
one protein is expressible from said synthetic nucleic 
acid sequence in said cell at a level sufficient to 
permit virus assembly therein; 

25 (b) introducing into said cell or a 

precursor thereof said synthetic nucleic acid sequence 
operably linked to one or more regulatoary nucleotide 
sequences ; and 

(c) expressing said at least one protein 

30 in said cell in the presence of other viral proteins 
required for assembly of said virus particle to 
thereby produce said virus particle. 
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18. A method for producing a virus particle in 
a cycling cell, said virus particle comprising at 
least one protein necessary for assembly of said virus 
particle, wherein said at least one protein is not 
5 expressed in said cell from a parent nucleic acid 
sequence at a level sufficient to permit virus 
assembly therein, and wherein at least one existing 
codon of said parent nucleic acid sequence is rate 
limiting for the production said at least one protein 

10 to said level, said method including the step of 
introducing into said cell a nucleic acid sequence 
capable of expressing therein an isoaccepting transfer 
RNA specific for said at least one codon. 

19 • A vector comprising a nucleic acid sequence 

15 according to any of claims 1 to 9 wherein said 
synthetic nucleic acid sequence is operably linked to 
one or more regulatory nucleic acid sequences. 

20. A pharmaceutical composition comprising a 
nucleic acid sequence according to any of claims 1 to 

20 9 together with a pharmaceutically acceptable carrier. 

21. A pharmaceutical composition comprising a 
vector according to claim 19 together with a 
pharmaceutically acceptable carrier. 

22 . A cell or tissue comprising therein a 
25 nucleic acid sequence according to any of claims 1 to 

9. 

23. A cell or tissue comprising therein a 
vector according to claim 19. 

24. A cell or tissue resulting from a method 
30 according to any one of claims 10 to 18. 

25. Virus particles produced from a method 
according to claims 17 or 18. 
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SEQUENCE LISTING 
<110> The University of Queensland 

<120> NUCLEIC ACID SEQUENCE AND METHOD FOR SELECTIVELY 
EXPRESSING A PROTEIN IN A TARGET CELL OR TISSUE 

<130> Selective Expression 

<140> PCT/AU98/00530 
<141> 1998-07-09 

<150> P07765 
<151> 1997-07-09 

<150> P09467 
<151> 1997-09-11 

<160> 34 

<170> PatentIn Ver, 2.0 

<210> 1 
<211> 1488 
<212> DNA 

<213> Bovine papillomavirus type 1 

<220> 

<221> CDS 

<222> (1) . . (1488) 

<220> 

<223> LI open reading frame (wild-type) 
<400> 1 

atg gcg ttg tgg caa caa ggc cag aag ctg tat etc cct oca acc cct 48 
Met Ala Leu Trp Gin Gin Gly Gin Lys Leu Tyr Leu Pro Pro Thr Pro 
15 10 15 

gta age aag gtg ett tgc agt gaa acc tat gtg caa aga aaa age att 96 
Val Ser Lys Val Leu Cys Ser Glu Thr Tyr Val Gin Arg Lys Ser lie 
20 25 30 

ttt tat cat gea gaa acg gag cgc ctg eta act ata gga cat eca tat 144 
Phe Tyr His Ala Glu Thr Glu Arg Leu Leu Thr lie Gly His Pro Tyr 
35 40 45 

tac cca gtg tct ate ggg gee aaa act gtt cct aag gtc tet gea aat 192 
Tyr Pro Val Ser lie Gly Ala Lys Thr Val Pro Lys Val Ser Ala Asn 
50 55 60 

cag tat agg gta ttt aaa ata caa eta cct gat ccc aat caa ttt gea 240 
Gin Tyr Arg Val Phe Lys lie Gin Leu Pro Asp Pro Asn Gin Phe Ala 
65 70 75 80 
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eta cct gac agg act gtt cac aac cca agt aaa gag egg ctg gtg tgg 288 
Leu Pro Asp Arg Thr Val His Asn Pro Ser Lys Glu Arg Leu Val Trp 
85 90 95 

cca gtc ata ggt gtg cag gtg tec aga ggg cag cct ctt gga ggt act 336 
Pro Val He Gly Val Gin Val Ser Arg Gly Gin Pro Leu Gly Gly Thr 
100 105 110 

gta act ggg cac ccc act ttt aat get ttg ctt gat gea gaa aat gtg 384 
Val Thr Gly His Pro Thr Phe Asn Ala Leu Leu Asp Ala Glu Asn Val 
115 120 125 

aat aga aaa gtc acc acc caa aca aca gat gac agg aaa caa aca ggc 432 
Asn Arg Lys Val Thr Thr Gin Thr Thr Asp Asp Arg Lys Gin Thr Gly 
130 135 140 

eta gat get aag caa caa cag att ctg ttg eta ggc tgt acc cct get 4 80 
Leu Asp Ala Lys Gin Gin Gin He Leu Leu Leu Gly Cys Thr Pro Ala 
145 150 155 160 

gaa ggg gaa tat tgg aca aca gee cgt cca tgt gtt act gat cgt eta 528 
Glu Gly Glu Tyr Trp Thr Thr Ala Arg Pro Cys Val Thr Asp Arg Leu 
165 170 175 

gaa aat ggc gee tgc cct cct ctt gaa tta aaa aac aag cac ata gaa 576 
Glu Asn Gly Ala Cys Pro Pro Leu Glu Leu Lys Asn Lys His He Glu 
180 185 190 

gat ggg gat atg atg gaa att ggg ttt ggt gca gee aac ttc aaa gaa 624 
Asp Gly Asp Met Met Glu He Gly Phe Gly Ala Ala Asn Phe Lys Glu 
195 200 205 

att aat gca agt aaa tea gat eta cct ctt gac att caa aat gag ate 672 
He Asn Ala Ser Lys Ser Asp Leu Pro Leu Asp He Gin Asn Glu He 
210 215 220 

tgc ttg tac cca gac tac etc aaa atg get gag gac get get ggt aat 720 
Cys Leu Tyr Pro Asp Tyr Leu Lys Met Ala Glu Asp Ala Ala Gly Asn 
225 230 235 240 

age atg ttc ttt ttt gca agg aaa gaa cag gtg tat gtt aga cac ate 768 
Ser Met Phe Phe Phe Ala Arg Lys Glu Gin Val Tyr Val Arg His He 
245 250 255 

tgg ace aga ggg ggc teg gag aaa gaa gee cct ace aca gat ttt tat 816 
Trp Thr Arg Gly Gly Ser Glu Lys Glu Ala Pro Thr Thr Asp Phe Tyr 
260 265 270 

tta aag aat aat aaa ggg gat gee acc ctt aaa ata ccc agt gtg cat 864 
Leu Lys Asn Asn Lys Gly Asp Ala Thr Leu Lys He Pro Ser Val His 
275 280 285 

ttt ggt agt ccc agt ggc tea eta gtc tea act gat aat caa att ttt 912 
Phe Gly Ser Pro Ser Gly Ser Leu Val Ser Thr Asp Asn Gin He Phe 
290 295 300 
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aat egg ccc tac tgg eta ttc cgt gee cag gge atg aac aat gga att 960 
Asn Arg Pro Tyr Trp Leu Phe Arg Ala Gin Gly Met Asn Asn Gly lie 
305 310 315 320 

gca tgg aat aat tta ttg ttt tta aca gtg ggg gac aat aca egt ggt 1008 
Ala Trp Asn Asn Leu Leu Phe Leu Thr Val Gly Asp Asn Thr Arg Gly 
325 330 335 

act aat ett ace ata agt gta gee tea gat gga ace eca eta aea gag 1056 
Thr Asn Leu Thr He Ser Val Ala Ser Asp Gly Thr Pro Leu Thr Glu 
340 345 350 

tat gat age tea aaa ttc aat gta tac cat aga cat atg gaa gaa tat 1104 
Tyr Asp Ser Ser Lys Phe Asn Val Tyr His Arg His Met Glu Glu Tyr 
355 360 365 

aag eta gee ttt ata tta gag eta tge tet gtg gaa ate aca get caa 1152 
Lys Leu Ala Phe He Leu Glu Leu Cys Ser Val Glu He Thr Ala Gin 
370 375 380 

act gtg tea cat ctg caa gga ett atg ccc tet gtg ett gaa aat tgg 1200 
Thr Val Ser His Leu Gin Gly Leu Met Pro Ser Val Leu Glu Asn Trp 
385 390 395 400 

gaa ata ggt gtg cag cet ect ace tea teg ata tta gag gac ace tat 1248 
Glu He Gly Val Gin Pro Pro Thr Ser Ser He Leu Glu Asp Thr Tyr 
405 410 415 

cgc tat ata gag tet ect gca act aaa tgt gca age aat gta att cet 1296 
Arg Tyr He Glu Ser Pro Ala Thr Lys Cys Ala Ser Asn Val He Pro 
420 425 430 

gca aaa gaa gac cet tat gca ggg ttt aag ttt tgg aac ata gat ett 1344 
Ala Lys Glu Asp Pro Tyr Ala Gly Phe Lys Phe Trp Asn He Asp Leu 
435 440 445 

aaa gaa aag ett tet ttg gac tta gat caa ttt ccc ttg gga aga aga 1392 
Lys Glu Lys Leu Ser Leu Asp Leu Asp Gin Phe Pro Leu Gly Arg Arg 
450 455 460 

ttt tta gca cag caa ggg gca gga tgt tea act gtg aga aaa ega aga 1440 
Phe Leu Ala Gin Gin Gly Ala Gly Cys Ser Thr Val Arg Lys Arg Arg 
465 470 475 480 

att age caa aaa act tec agt aag cet gca aaa aaa aaa aaa aaa taa 1488 
He Ser Gin Lys Thr Ser Ser Lys Pro Ala Lys Lys Lys Lys Lys 
485 490 495 



<210> 2 
<211> 495 
<212> PRT 

<213> Bovine papillomavirus type 1 
<400> 2 

Met Ala Leu Trp Gin Gin Gly Gin Lys Leu Tyr Leu Pro Pro Thr Pro 
15 10 15 
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Val Ser Lys Val Leu Cys Ser Glu Thr Tyr Val Gin Arg Lys Ser lie 
20 25 30 

Phe Tyr His Ala Glu Thr Glu Arg Leu Leu Thr lie Gly His Pro Tyr 
35 40 45 

Tyr Pro Val Ser lie Gly Ala Lys Thr Val Pro Lys Val Ser Ala Asn 
50 55 60 

Gin Tyr Arg Val Phe Lys lie Gin Leu Pro Asp Pro Asn Gin Phe Ala 
65 70 75 80 

Leu Pro Asp Arg Thr Val His Asn Pro Ser Lys Glu Arg Leu Val Trp 
85 90 95 

Pro Val He Gly Val Gin Val Ser Arg Gly Gin Pro Leu Gly Gly Thr 
100 105 110 

Val Thr Gly His Pro Thr Phe Asn Ala Leu Leu Asp Ala Glu Asn Val 
115 120 125 

Asn Arg Lys Val Thr Thr Gin Thr Thr Asp Asp Arg Lys Gin Thr Gly 
130 135 140 

Leu Asp Ala Lys Gin Gin Gin He Leu Leu Leu Gly Cys Thr Pro Ala 
145 150 155 160 

Glu Gly Glu Tyr Trp Thr Thr Ala Arg Pro Cys Val Thr Asp Arg Leu 
165 170 175 

Glu Asn Gly Ala Cys Pro Pro Leu Glu Leu Lys Asn Lys His He Glu 
180 185 190 

Asp Gly Asp Met Met Glu He Gly Phe Gly Ala Ala Asn Phe Lys Glu 
195 200 205 

He Asn Ala Ser Lys Ser Asp Leu Pro Leu Asp He Gin Asn Glu He 
210 215 220 

Cys Leu Tyr Pro Asp Tyr Leu Lys Met Ala Glu Asp Ala Ala Gly Asn 
225 230 235 240 

Ser Met Phe Phe Phe Ala Arg Lys Glu Gin Val Tyr Val Arg His He 
245 250 255 

Trp Thr Arg Gly Gly Ser Glu Lys Glu Ala Pro Thr Thr Asp Phe Tyr 
260 265 270 

Leu Lys Asn Asn Lys Gly Asp Ala Thr Leu Lys He Pro Ser Val His 
275 280 285 

Phe Gly Ser Pro Ser Gly Ser Leu Val Ser Thr Asp Asn Gin He Phe 
290 295 300 

Asn Arg Pro Tyr Trp Leu Phe Arg Ala Gin Gly Met Asn Asn Gly He 
305 310 315 320 
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Ala Trp Asn Asn Leu Leu Phe Leu Thr Val Gly Asp Asn Thr Arg Gly 
325 330 335 

Thr Asn Leu Thr He Ser Val Ala Ser Asp Gly Thr Pro Leu Thr Glu 
340 345 350 

Tyr Asp Ser Ser Lys Phe Asn Val Tyr His Arg His Met Glu Glu Tyr 
355 360 365 

Lys Leu Ala Phe He Leu Glu Leu Cys Ser Val Glu He Thr Ala Gin 
370 375 380 

Thr Val Ser His Leu Gin Gly Leu Met Pro Ser Val Leu Glu Asn Trp 
385 390 395 400 

Glu He Gly Val Gin Pro Pro Thr Ser Ser He Leu Glu Asp Thr Tyr 
405 410 415 

Arg Tyr He Glu Ser Pro Ala Thr Lys Cys Ala Ser Asn Val He Pro 
420 425 430 

Ala Lys Glu Asp Pro Tyr Ala Gly Phe Lys Phe Trp Asn He Asp Leu 
435 440 445 

Lys Glu Lys Leu Ser Leu Asp Leu Asp Gin Phe Pro Leu Gly Arg Arg 
450 455 460 

Phe Leu Ala Gin Gin Gly Ala Gly Cys Ser Thr Val Arg Lys Arg Arg 
465 470 475 480 

He Ser Gin Lys Thr Ser Ser Lys Pro Ala Lys Lys Lys Lys Lys 
485 490 495 



<210> 3 
<211> 1488 
<212> DNA 

<213> Artificial Sequence 

<220> 

<221> CDS 

<222> (1) . . (1488) 

<220> 

<223> Description of Artificial Sequence: Bovine 
papillomavirus type 1 LI open reading frame 
(humanized) 

<220> 

<223> Wild- type codons replaced with synonymous codons 
used at relatively high frequency by human genes 

<400> 3 

atg gcc ctg tgg cag cag ggc cag aag ctg tac ctg ccc cct acc ccc 48 
Met Ala Leu Trp Gin Gin Gly Gin Lys Leu Tyr Leu Pro Pro Thr Pro 
15 10 15 
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gtg age aag gtg ctt tgc agt gaa acc tat gtg caa aga aaa age att 96 
Val Ser Lys Val Leu Cys Ser Glu Thr Tyr Val Gin Arg Lys Ser lie 
20 25 30 

ttt tat cat gca gaa acg gag cgc ctg etg acc ate gga cac ccc tat 144 
Phe Tyr His Ala Glu Thr Glu Arg Leu Leu Thr lie Gly His Pro Tyr 
35 40 45 

tac ccc gtg tec ate ggg gcc aag act gtg ect aag gtg tec gee aat 192 
Tyr Pro Val Ser lie Gly Ala Lys Thr Val Pro Lys Val Ser Ala Asn 
50 55 60 

eag tat agg gtg ttc aaa ate caa ctg cct gat ccc aat caa ttt gca 240 
Gin Tyr Arg Val Phe Lys lie Gin Leu Pro Asp Pro Asn Gin Phe Ala 
65 70 75 80 

etg ect gae agg acc gtg cac aac ccc age aaa gag egg etg gtg tgg 2 88 
Leu Pro Asp Arg Thr Val His Asn Pro Ser Lys Glu Arg Leu Val Trp 
85 90 95 

cea gtg ate ggc gtg eag gtg tec aga ggc cag ect etg ggc ggc ace 336 
Pro Val lie Gly Val Gin Val Ser Arg Gly Gin Pro Leu Gly Gly Thr 
100 105 110 

gtg act ggg cac ccc act ttt aat get ttg ctt gat gca gaa aat gtg 384 
Val Thr Gly His Pro Thr Phe Asn Ala Leu Leu Asp Ala Glu Asn Val 
115 120 125 

aat aga aaa gtc acc acc cag ace ace gac gac agg aaa cag aca ggc 432 
Asn Arg Lys Val Thr Thr Gin Thr Thr Asp Asp Arg Lys Gin Thr Gly 
130 135 140 

etg gat gee aag cag cag eag ate ctg etg ctg ggc tgt ace ect get 480 
Leu Asp Ala Lys Gin Gin Gin lie Leu Leu Leu Gly Cys Thr Pro Ala 
145 150 155 160 

gaa ggg gaa tat tgg aca aca gcc egt cea tgt gtg ace gae cgt eta 52 8 
Glu Gly Glu Tyr Trp Thr Thr Ala Arg Pro Cys Val Thr Asp Arg Leu 
165 170 175 

gaa aac ggc gcc tgc ect cct etg gag ctg aaa aac aag cac ate gaa 576 
Glu Asn Gly Ala Cys Pro Pro Leu Glu Leu Lys Asn Lys His lie Glu 
180 185 190 

gat ggg gat atg atg gaa att ggg ttt ggt gca gcc aac ttc aaa gaa 624 
Asp Gly Asp Met Met Glu lie Gly Phe Gly Ala Ala Asn Phe Lys Glu 
195 200 205 

att aat gca agt aaa tea gat eta cct etg gae ate caa aat gag ate 672 
lie Asn Ala Ser Lys Ser Asp Leu Pro Leu Asp lie Gin Asn Glu lie 
210 215 220 

tgc ctg tac ccc gac tac ctg aaa atg get gag gae gee gcc ggc aac 720 
Cys Leu Tyr Pro Asp Tyr Leu Lys Met Ala Glu Asp Ala Ala Gly Asn 
225 230 235 240 
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age atg ttc ttc ttc gcc agg aag gag cag gtg tac gtg aga cac ate 768 
Ser Met Phe Phe Phe Ala Arg Lys Glu Gin Val Tyr Val Arg His lie 
245 250 255 

tgg acc aga ggc gge tec gag aaa gaa gcc cct acc aca gat ttt tat 816 
Trp Thr Arg Gly Gly Ser Glu Lys Glu Ala Pro Thr Thr Asp Phe Tyr 
260 265 270 

ttg aag aac aac aag gge gae gee aee etg aag ate cee age gtg eae 864 
Leu Lys Asn Asn Lys Gly Asp Ala Thr Leu Lys lie Pro Ser Val His 
275 280 285 

ttc gge age cec age gge tea eta gtg tec ace gae aac cag ate ttc 912 
Phe Gly Ser Pro Ser Gly Ser Leu Val Ser Thr Asp Asn Gin lie Phe 
290 295 300 

aac egg ccc tac tgg etg ttc cgc gcc cag ggc atg aac aat gga att 960 
Asn Arg Pro Tyr Trp Leu Phe Arg Ala Gin Gly Met Asn Asn Gly lie 
305 310 315 320 

gee tgg aac aac etg etg ttc etg acc gtg ggc gae aac aca egt gge 1008 
Ala Trp Asn Asn Leu Leu Phe Leu Thr Val Gly Asp Asn Thr Arg Gly 
325 330 335 

acc aac etg acc ate age gtg gee tec gat gga aee cea etg acc gag 1056 
Thr Asn Leu Thr lie Ser Val Ala Ser Asp Gly Thr Pro Leu Thr Glu 
340 345 350 

tat gat age teg aaa ttc aac gtg tac cac aga cac atg gag gag tat 1104 
Tyr Asp Ser Ser Lys Phe Asn Val Tyr His Arg His Met Glu Glu Tyr 
355 360 365 

aag eta gcc ttc ate etg gag etg tgc tec gtg gag ate acc gcc cag 1152 
Lys Leu Ala Phe lie Leu Glu Leu Cys Ser Val Glu lie Thr Ala Gin 
370 375 380 

acc gtg tec cat etg caa gga etg atg ccc tee gtg etg gag aat tgg 1200 
Thr Val Ser His Leu Gin Gly Leu Met Pro Ser Val Leu Glu Asn Trp 
385 390 395 400 

gag ate ggc gtg cag cee ccc acc tea teg ate ttg gag gae acc tac 1248 
Glu lie Gly Val Gin Pro Pro Thr Ser Ser lie Leu Glu Asp Thr Tyr 
405 410 415 

cgc tac ate gag tec ccc gcc acc aag tgt gee age aac gtg att cct 1296 
Arg Tyr lie Glu Ser Pro Ala Thr Lys Cys Ala Ser Asn Val lie Pro 
420 425 430 

gca aaa gaa gae cct tat gea ggg ttt aag ttc tgg aac ate gae etg 1344 
Ala Lys Glu Asp Pro Tyr Ala Gly Phe Lys Phe Trp Asn lie Asp Leu 
435 440 445 

aag gag aag etg tct etg gae etg gat cag ttc ccc ttg ggc aga aga 1392 
Lys Glu Lys Leu Ser Leu Asp Leu Asp Gin Phe Pro Leu Gly Arg Arg 
450 455 460 
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ttt ctg gcc cag cag ggg gcc ggc tgt tec acc gtg aga aaa cgc agg 1440 
Phe Leu Ala Gin Gin Gly Ala Gly Cys Ser Thr Val Arg hys Arg Arg 
465 470 475 480 

ate age cag aag acc tec age aag cec gcc aag aag aag aaa aag taa 14 6 8 
lie Ser Gin Lys Thr Ser Ser Lys Pro Ala Lys Lys Lys Lys Lys 
485 490 495 



<210> 4 
<211> 495 
<212> PRT 

<213> Artificial Sequence 
<400> 4 

Met Ala Leu Trp Gin Gin Gly Gin Lys Leu Tyr Leu Pro Pro Thr Pro 
15 10 15 

Val Ser Lys Val Leu Cys Ser Glu Thr Tyr Val Gin Arg Lys Ser lie 
20 25 30 

Phe Tyr His Ala Glu Thr Glu Arg Leu Leu Thr lie Gly His Pro Tyr 
35 40 45 

Tyr Pro Val Ser lie Gly Ala Lys Thr Val Pro Lys Val Ser Ala Asn 
50 55 60 

Gin Tyr Arg Val Phe Lys lie Gin Leu Pro Asp Pro Asn Gin Phe Ala 
65 70 75 80 

Leu Pro Asp Arg Thr Val His Asn Pro Ser Lys Glu Arg Leu Val Trp 
85 90 95 

Pro Val lie Gly Val Gin Val Ser Arg Gly Gin Pro Leu Gly Gly Thr 
100 105 110 

Val Thr Gly His Pro Thr Phe Asn Ala Leu Leu Asp Ala Glu Asn Val 
115 120 125 

Asn Arg Lys Val Thr Thr Gin Thr Thr Asp Asp Arg Lys Gin Thr Gly 
130 135 140 

Leu Asp Ala Lys Gin Gin Gin lie Leu Leu Leu Gly Cys Thr Pro Ala 
145 150 155 160 

Glu Gly Glu Tyr Trp Thr Thr Ala Arg Pro Cys Val Thr Asp Arg Leu 
165 170 175 

Glu Asn Gly Ala Cys Pro Pro Leu Glu Leu Lys Asn Lys His lie Glu 
180 185 190 

Asp Gly Asp Met Met Glu lie Gly Phe Gly Ala Ala Asn Phe Lys Glu 
195 200 205 

lie Asn Ala Ser Lys Ser Asp Leu Pro Leu Asp lie Gin Asn Glu lie 
210 215 220 
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Cys Leu Tyr Pro Asp Tyr Leu Lys Met Ala Glu Asp Ala Ala Gly Asn 
225 230 235 240 

Ser Met Phe Phe Phe Ala Arg Lys Glu Gin Val Tyr Val Arg His lie 
245 250 255 

Trp Thr Arg Gly Gly Ser Glu Lys Glu Ala Pro Thr Thr Asp Phe Tyr 
260 265 270 

Leu Lys Asn Asn Lys Gly Asp Ala Thr Leu Lys lie Pro Ser Val His 
275 280 285 

Phe Gly Ser Pro Ser Gly Ser Leu Val Ser Thr Asp Asn Gin lie Phe 
290 295 300 

Asn Arg Pro Tyr Trp Leu Phe Arg Ala Gin Gly Met Asn Asn Gly lie 
305 310 315 320 

Ala Trp Asn Asn Leu Leu Phe Leu Thr Val Gly Asp Asn Thr Arg Gly 
325 330 335 

Thr Asn Leu Thr lie Ser Val Ala Ser Asp Gly Thr Pro Leu Thr Glu 
340 345 350 

Tyr Asp Ser Ser Lys Phe Asn Val Tyr His Arg His Met Glu Glu Tyr 
355 360 365 

Lys Leu Ala Phe lie Leu Glu Leu Cys Ser Val Glu lie Thr Ala Gin 
370 375 380 

Thr Val Ser His Leu Gin Gly Leu Met Pro Ser Val Leu Glu Asn Trp 
385 390 395 400 

Glu lie Gly Val Gin Pro Pro Thr Ser Ser lie Leu Glu Asp Thr Tyr 
405 410 415 

Arg Tyr lie Glu Ser Pro Ala Thr Lys Cys Ala Ser Asn Val lie Pro 
420 425 430 

Ala Lys Glu Asp Pro Tyr Ala Gly Phe Lys Phe Trp Asn lie Asp Leu 
435 440 445 

Lys Glu Lys Leu Ser Leu Asp Leu Asp Gin Phe Pro Leu Gly Arg Arg 
450 455 460 

Phe Leu Ala Gin Gin Gly Ala Gly Cys Ser Thr Val Arg Lys Arg Arg 
465 470 475 480 

lie Ser Gin Lys Thr Ser Ser Lys Pro Ala Lys Lys Lys Lys Lys 
485 490 495 



<210> 5 
<211> 1410 
<212> DNA 

<213> Bovine papillomavirus type 1 
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<220> 

<221> CDS 

<222> (1)..(1410) 

<220> 

<223> L2 open reading frame (wild- type) 
<400> 5 

atg agt gca cga aaa aga gta aaa cgt gcc agt gcc tat gac ctg tac 48 
Met Ser Ala Arg Lys Arg Val Lys Arg Ala Ser Ala Tyr Asp Leu Tyr 
1 5 10 15 

agg acc tgc aag caa gcg ggc aca tgt cca cca gat gtg ata cga aag 96 
Arg Thr Cys Lys Gin Ala Gly Thr Cys Pro Pro Asp Val lie Arg Lys 
20 25 30 

gta gaa gga gat act ata gca gat aaa att ttg aaa ttt ggg ggt ctt 144 
Val Glu Gly Asp Thr lie Ala Asp Lys He Leu Lys Phe Gly Gly Leu 
35 40 45 

gca ate tac tta gga ggg eta gga ata gga aca tgg tct act gga agg 192 
Ala He Tyr Leu Gly Gly Leu Gly He Gly Thr Trp Ser Thr Gly Arg 
50 55 60 

gtg gcc gca ggt gga tea cca agg tac aca cca etc cga aca gca ggg 240 
Val Ala Ala Gly Gly Ser Pro Arg Tyr Thr Pro Leu Arg Thr Ala Gly 
65 70 75 80 

tec aca tea teg ctt gca tea ata gga tec aga get gta aca gca ggg 288 
Ser Thr Ser Ser Leu Ala Ser He Gly Ser Arg Ala Val Thr Ala Gly 
85 90 95 

acc cgc ecc agt ata ggt gcg ggc att cct tta gac acc ctt gaa act 336 
Thr Arg Pro Ser He Gly Ala Gly He Pro Leu Asp Thr Leu Glu Thr 
100 105 110 

ctt ggg gcc ttg cgt cca ggg gtg tat gag gac act gtg eta cca gag 384 
Leu Gly Ala Leu Arg Pro Gly Val Tyr Glu Asp Thr Val Leu Pro Glu 
115 120 125 

gee cct gca ata gtc act cct gat get gtt cct gca gat tea ggg ctt 432 
Ala Pro Ala He Val Thr Pro Asp Ala Val Pro Ala Asp Ser Gly Leu 
130 135 140 

gat gee ctg tec ata ggt aca gac teg tec aeg gag acc etc att act 480 
Asp Ala Leu Ser He Gly Thr Asp Ser Ser Thr Glu Thr Leu He Thr 
145 150 155 160 

ctg eta gag cct gag ggt ecc gag gac ata gcg gtt ctt gag ctg caa 528 
Leu Leu Glu Pro Glu Gly Pro Glu Asp He Ala Val Leu Glu Leu Gin 
165 170 175 

ecc ctg gac cgt cca act tgg caa gta age aat get gtt cat cag tec 576 
Pro Leu Asp Arg Pro Thr Trp Gin Val Ser Asn Ala Val His Gin Ser 
180 185 190 
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tct gca tac cac gcc cct ctg cag ctg caa teg tec att gca gaa aca 624 
Ser Ala Tyr His Ala Pro Leu Gin Leu Gin Ser Ser lie Ala Glu Thr 
195 200 205 

tct ggt tta gaa aat att ttt gta gga ggc teg ggt tta ggg gat aca 672 
Ser Gly Leu Glu Asn lie Phe Val Gly Gly Ser Gly Leu Gly Asp Thr 
210 215 220 

gga gga gaa aac att gaa ctg aca tac ttc ggg tec cca cga aca age 720 
Gly Gly Glu Asn lie Glu Leu Thr Tyr Phe Gly Ser Pro Arg Thr Ser 
225 230 235 240 

acg ccc cgc agt att gcc tct aaa tea egt ggc att tta aac tgg ttc 768 
Thr Pro Arg Ser lie Ala Ser Lys Ser Arg Gly lie Leu Asn Trp Phe 
245 250 255 

agt aaa egg tac tac aca cag gtg ccc acg gaa gat cct gaa gtg ttt 816 
Ser Lys Arg Tyr Tyr Thr Gin Val Pro Thr Glu Asp Pro Glu Val Phe 
260 265 270 

tea tee caa aca ttt gca aac cca ctg tat gaa gca gaa cca get gtg 864 
Ser Ser Gin Thr Phe Ala Asn Pro Leu Tyr Glu Ala Glu Pro Ala Val 
275 280 285 

ctt aag gga cct agt gga cgt gtt gga etc agt cag gtt tat aaa cct 912 
Leu Lys Gly Pro Ser Gly Arg Val Gly Leu Ser Gin Val Tyr Lys Pro 
290 295 300 

gat aca ctt aca aca cgt age ggg aca gag gtg gga cca cag eta cat 960 
Asp Thr Leu Thr Thr Arg Ser Gly Thr Glu Val Gly Pro Gin Leu His 
305 310 315 320 

gtc agg tac tea ttg agt act ata cat gaa gat gta gaa gca ate ccc 1008 
Val Arg Tyr Ser Leu Ser Thr lie His Glu Asp Val Glu Ala lie Pro 
325 330 335 

tac aca gtt gat gaa aat aca cag gga ctt gca ttc gta ccc ttg cat 1056 
Tyr Thr Val Asp Glu Asn Thr Gin Gly Leu Ala Phe Val Pro Leu His 
340 345 350 

gaa gag caa gca ggt ttt gag gag ata gaa tta gat gat ttt agt gag 1104 
Glu Glu Gin Ala Gly Phe Glu Glu lie Glu Leu Asp Asp Phe Ser Glu 
355 360 365 

aca cat aga ctg eta cct cag aac ace tct tct aca cct gtt ggt agt 1152 
Thr His Arg Leu Leu Pro Gin Asn Thr Ser Ser Thr Pro Val Gly Ser 
370 375 380 

ggt gta cga aga age etc att cca act cga gaa ttt agt gca aca egg 1200 
Gly Val Arg Arg Ser Leu He Pro Thr Arg Glu Phe Ser Ala Thr Arg 
385 390 395 400 

cct aca ggt gtt gta ace tat ggc tea cct gac act tac tct get age 124 8 
Pro Thr Gly Val Val Thr Tyr Gly Ser Pro Asp Thr Tyr Ser Ala Ser 
405 410 415 
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cca gtt act gac cct gat tct acc tct cct agt eta gtt ate gat gac 1296 
Pro Val Thr Asp Pro Asp Ser Thr Ser Pro Ser Leu Val He Asp Asp 
420 425 430 

act act act aca cca ate att ata att gat ggg cac aca gtt gat ttg 1344 
Thr Thr Thr Thr Pro He He He He Asp Gly His Thr Val Asp Leu 
435 440 445 

tac age agt aac tae aec ttg eat ecc tec ttg ttg agg aaa ega aaa 1392 
Tyr Ser Ser Asn Tyr Thr Leu His Pro Ser Leu Leu Arg Lys Arg Lys 
450 455 460 

aaa egg aaa cat gcc taa 1410 
Lys Arg Lys His Ala 
465 470 



<210> 6 
<211> 469 
<212> PRT 

<213> Bovine papillomavirus type 1 
<400> 6 

Met Ser Ala Arg Lys Arg Val Lys Arg Ala Ser Ala Tyr Asp Leu Tyr 
15 10 15 

Arg Thr Cys Lys Gin Ala Gly Thr Cys Pro Pro Asp Val He Arg Lys 
20 25 30 

Val Glu Gly Asp Thr He Ala Asp Lys He Leu Lys Phe Gly Gly Leu 
35 40 45 

Ala He Tyr Leu Gly Gly Leu Gly He Gly Thr Trp Ser Thr Gly Arg 
50 55 60 

Val Ala Ala Gly Gly Ser Pro Arg Tyr Thr Pro Leu Arg Thr Ala Gly 
65 70 75 80 

Ser Thr Ser Ser Leu Ala Ser He Gly Ser Arg Ala Val Thr Ala Gly 
85 90 95 

Thr Arg Pro Ser He Gly Ala Gly He Pro Leu Asp Thr Leu Glu Thr 
100 105 110 

Leu Gly Ala Leu Arg Pro Gly Val Tyr Glu Asp Thr Val Leu Pro Glu 
115 120 125 

Ala Pro Ala He Val Thr Pro Asp Ala Val Pro Ala Asp Ser Gly Leu 
130 135 140 

Asp Ala Leu Ser He Gly Thr Asp Ser Ser Thr Glu Thr Leu He Thr 
145 150 155 160 

Leu Leu Glu Pro Glu Gly Pro Glu Asp He Ala Val Leu Glu Leu Gin 
165 170 175 
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Pro Leu Asp Arg Pro Thr Trp Gin Val Ser Asn Ala Val His Gin Ser 
180 185 190 

Ser Ala Tyr His Ala Pro Leu Gin Leu Gin Ser Ser lie Ala Glu Thr 
195 200 205 

Ser Gly Leu Glu Asn lie Phe Val Gly Gly Ser Gly Leu Gly Asp Thr 
210 215 220 

Gly Gly Glu Asn lie Glu Leu Thr Tyr Phe Gly Ser Pro Arg Thr Ser 
225 230 235 240 

Thr Pro Arg Ser lie Ala Ser Lys Ser Arg Gly lie Leu Asn Trp Phe 
245 250 255 

Ser Lys Arg Tyr Tyr Thr Gin Val Pro Thr Glu Asp Pro Glu Val Phe 
260 265 270 

Ser Ser Gin Thr Phe Ala Asn Pro Leu Tyr Glu Ala Glu Pro Ala Val 
275 280 285 

Leu Lys Gly Pro Ser Gly Arg Val Gly Leu Ser Gin Val Tyr Lys Pro 
290 295 300 

Asp Thr Leu Thr Thr Arg Ser Gly Thr Glu Val Gly Pro Gin Leu His 
305 310 315 320 

Val Arg Tyr Ser Leu Ser Thr He His Glu Asp Val Glu Ala He Pro 
325 330 335 

Tyr Thr Val Asp Glu Asn Thr Gin Gly Leu Ala Phe Val Pro Leu His 
340 345 350 

Glu Glu Gin Ala Gly Phe Glu Glu He Glu Leu Asp Asp Phe Ser Glu 
355 360 365 

Thr His Arg Leu Leu Pro Gin Asn Thr Ser Ser Thr Pro Val Gly Ser 
370 375 380 

Gly Val Arg Arg Ser Leu He Pro Thr Arg Glu Phe Ser Ala Thr Arg 
385 390 395 400 

Pro Thr Gly Val Val Thr Tyr Gly Ser Pro Asp Thr Tyr Ser Ala Ser 
405 410 415 

Pro Val Thr Asp Pro Asp Ser Thr Ser Pro Ser Leu Val He Asp Asp 
420 425 430 

Thr Thr Thr Thr Pro He He He He Asp Gly His Thr Val Asp Leu 
435 440 445 

Tyr Ser Ser Asn Tyr Thr Leu His Pro Ser Leu Leu Arg Lys Arg Lys 
450 455 460 

Lys Arg Lys His Ala 
465 
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<210> 7 
<211> 1410 
<212> DNA 

<213> Artificial Sequence 

<220> 

<221> CDS 

<222> (1) . . (1410) 

<220> 

<223> Description of Artificial Sequence: Bovine 
papillomavirus type 1 L2 open reading frame 
(humanized) 

<220> 

<223> Wild- type codons replaced with synonymous codons 
used at relatively high frequency by human genes 

<400> 7 

atg age gcc cgc aag aga gtg aag cgc gcc age gcc tac gac ctg tac 4 8 
Met Ser Ala Arg Lys Arg Val Lys Arg Ala Ser Ala Tyr Asp Leu Tyr 
15 10 15 

agg acc tgc aag cag gcc ggc aca tgt cca cca gat gtg ate cga aag 96 
Arg Thr Cys Lys Gin Ala Gly Thr Cys Pro Pro Asp Val lie Arg Lys 
20 25 30 

gtg gag ggc gac acc ate gee gac aag ate ctg aag ttc ggc ggc ctg 144 
Val Glu Gly Asp Thr lie Ala Asp Lys lie Leu Lys Phe Gly Gly Leu 
35 40 45 

gee ate tac ctg ggc ggc ctg ggc ate gga aca tgg tct acc ggc agg 192 
Ala lie Tyr Leu Gly Gly Leu Gly lie Gly Thr Trp Ser Thr Gly Arg 
50 55 60 

gtg gcc gcc ggc ggc tea cca agg tac acc cca ctg cgc acc gcc ggc 24 0 
Val Ala Ala Gly Gly Ser Pro Arg Tyr Thr Pro Leu Arg Thr Ala Gly 
65 70 75 80 

tec ace tec tec ctg gcc tec ate gga tec aga gcc gtg acc gcc ggg 288 
Ser Thr Ser Ser Leu Ala Ser lie Gly Ser Arg Ala Val Thr Ala Gly 
85 90 95 

ace cgc cec tec ate ggc geg ggc ate ect ctg gac ace ctg gaa act 336 
Thr Arg Pro Ser lie Gly Ala Gly lie Pro Leu Asp Thr Leu Glu Thr 
100 105 110 

ett ggg gcc ctg cgc ect ggc gtg tac gag gac acc gtg ctg ccc gaa 384 
Leu Gly Ala Leu Arg Pro Gly Val Tyr Glu Asp Thr Val Leu Pro Glu 
115 120 125 

gcc ect gee ate gtg acc ect gac gcc gtg ect gea gac tec ggc ctg 432 
Ala Pro Ala lie Val Thr Pro Asp Ala Val Pro Ala Asp Ser Gly Leu 
130 135 140 
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gac gcc ctg tec ate ggc aca gac tec tec ace gag ace ctg ate ace 480 
Asp Ala Leu Ser lie Gly Thr Asp Ser Ser Thr Glu Thr Leu lie Thr 
145 150 155 160 

ctg ctg gag cct gag ggc ecc gaa gac ata gcc gtg ctg gaa etc cag 528 
Leu Leu Glu Pro Glu Gly Pro Glu Asp lie Ala Val Leu Glu Leu Gin 
165 170 175 

ccc ctg gac cge eca ace tgg cag gtg age aat get gtg cac cag tec 576 
Pro Leu Asp Arg Pro Thr Trp Gin Val Ser Asn Ala Val His Gin Ser 
180 185 190 

tct gcc tac cac gcc cct etc cag ctg caa tec tec ate gcc gag aca 624 
Ser Ala Tyr His Ala Pro Leu Gin Leu Gin Ser Ser lie Ala Glu Thr 
195 200 205 

tet ggt tta gaa aat att ttt gta gga ggc teg ggt tta ggg gat ace 672 
Ser Gly Leu Glu Asn lie Phe Val Gly Gly Ser Gly Leu Gly Asp Thr 
210 215 220 

ggc ggc gag aac ate gag ctg acc tac ttc ggc tec ccc cgc acc age 720 
Gly Gly Glu Asn lie Glu Leu Thr Tyr Phe Gly Ser Pro Arg Thr Ser 
225 230 235 240 

acc ccc cgc tee ate gcc tec aag tec cgc ggc ate ctg aac tgg ttc 768 
Thr Pro Arg Ser lie Ala Ser Lys Ser Arg Gly lie Leu Asn Trp Phe 
245 250 255 

age aag egg tac tac acc cag gtg ccc acc gaa gat ccc gaa gtg ttc 816 
Ser Lys Arg Tyr Tyr Thr Gin Val Pro Thr Glu Asp Pro Glu Val Phe 
260 265 270 

tec tec cag ace ttc gee aac ccc ctg tac gag gcc gag ecc gee gtg 864 
Ser Ser Gin Thr Phe Ala Asn Pro Leu Tyr Glu Ala Glu Pro Ala Val 
275 280 285 

ctg aag ggc cct age ggc cgc gtg ggc ctg tec cag gtg tac aag cct 912 
Leu Lys Gly Pro Ser Gly Arg Val Gly Leu Ser Gin Val Tyr Lys Pro 
290 295 300 

gat acc ctg ace aca cgt age ggc aca gag gtg ggc ecc cag ctg cat 960 
Asp Thr Leu Thr Thr Arg Ser Gly Thr Glu Val Gly Pro Gin Leu His 
305 310 315 320 

gtg agg tac tec ctg tec acc ate cat gag gat gtg gag get ate ccc 1008 
Val Arg Tyr Ser Leu Ser Thr lie His Glu Asp Val Glu Ala lie Pro 
325 330 335 

tac acc gtg gat gag aac ace cag ggc ctg gee ttc gtg ccc ctg eat 1056 
Tyr Thr Val Asp Glu Asn Thr Gin Gly Leu Ala Phe Val Pro Leu His 
340 345 350 

gag gag cag gcc ggc ttc gag gag ate gag etc gac gat ttc age gag 1104 
Glu Glu Gin Ala Gly Phe Glu Glu lie Glu Leu Asp Asp Phe Ser Glu 
355 360 365 
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acc cat cgc ctg ctg ccc cag aac acc tec tec ace cec gtg ggc age 1152 
Thr His Arg Leu Leu Pro Gin Asn Thr Ser Ser Thr Pro Val Gly Ser 
370 375 380 

ggc gtg cgc aga age ctg ate ect acc cga gag ttc age gcc acc egg 1200 
Gly Val Arg Arg Ser Leu lie Pro Thr Arg Glu Phe Ser Ala Thr Arg 
385 390 395 400 

ect ace ggc gtg gtg ace tac ggc tee ccc gae ace tae tec get age 1248 
Pro Thr Gly Val Val Thr Tyr Gly Ser Pro Asp Thr Tyr Ser Ala Ser 
405 410 415 

ccc gtg ace gac ect gat tct acc tct ect age ctg gtg ate gae gae 1296 
Pro Val Thr Asp Pro Asp Ser Thr Ser Pro Ser Leu Val lie Asp Asp 
420 425 430 

acc ace ace acc ccc ate ate ate ate gac ggc eac aca gtg gat ctg 1344 
Thr Thr Thr Thr Pro lie lie lie lie Asp Gly His Thr Val Asp Leu 
435 440 445 

tac age age aac tae ace ctg eat ccc tec ctg ctg agg aag cgc aag 1392 
Tyr Ser Ser Asn Tyr Thr Leu His Pro Ser Leu Leu Arg Lys Arg Lys 
450 455 460 

aag cgc aag cat gee taa 1410 
Lys Arg Lys His Ala 
465 470 



<210> 8 
<211> 469 
<212> PRT 

<213> Artificial Sequence 
<400> 8 

Met Ser Ala Arg Lys Arg Val Lys Arg Ala Ser Ala Tyr Asp Leu Tyr 
15 10 15 

Arg Thr Cys Lys Gin Ala Gly Thr Cys Pro Pro Asp Val lie Arg Lys 
20 25 30 

Val Glu Gly Asp Thr lie Ala Asp Lys lie Leu Lys Phe Gly Gly Leu 
35 40 45 

Ala lie Tyr Leu Gly Gly Leu Gly lie Gly Thr Trp Ser Thr Gly Arg 
50 55 60 

Val Ala Ala Gly Gly Ser Pro Arg Tyr Thr Pro Leu Arg Thr Ala Gly 
65 70 75 80 

Ser Thr Ser Ser Leu Ala Ser lie Gly Ser Arg Ala Val Thr Ala Gly 
85 90 95 

Thr Arg Pro Ser lie Gly Ala Gly lie Pro- Leu Asp Thr Leu Glu Thr 
100 105 110 
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Leu Gly Ala Leu Arg Pro Gly Val Tyr Glu Asp Thr Val Leu Pro Glu 
115 120 125 

Ala Pro Ala lie Val Thr Pro Asp Ala Val Pro Ala Asp Ser Gly Leu 
130 135 140 

Asp Ala Leu Ser lie Gly Thr Asp Ser Ser Thr Glu Thr Leu lie Thr 
145 150 155 160 

Leu Leu Glu Pro Glu Gly Pro Glu Asp lie Ala Val Leu Glu Leu Gin 
165 170 175 

Pro Leu Asp Arg Pro Thr Trp Gin Val Ser Asn Ala Val His Gin Ser 
180 185 190 

Ser Ala Tyr His Ala Pro Leu Gin Leu Gin Ser Ser lie Ala Glu Thr 
195 200 205 

Ser Gly Leu Glu Asn lie Phe Val Gly Gly Ser Gly Leu Gly Asp Thr 
210 215 220 

Gly Gly Glu Asn lie Glu Leu Thr Tyr Phe Gly Ser Pro Arg Thr Ser 
225 230 235 240 

Thr Pro Arg Ser lie Ala Ser Lys Ser Arg Gly lie Leu Asn Trp Phe 
245 250 255 

Ser Lys Arg Tyr Tyr Thr Gin Val Pro Thr Glu Asp Pro Glu Val Phe 
260 265 270 

Ser Ser Gin Thr Phe Ala Asn Pro Leu Tyr Glu Ala Glu Pro Ala Val 
275 280 285 

Leu Lys Gly Pro Ser Gly Arg Val Gly Leu Ser Gin Val Tyr Lys Pro 
290 295 300 

Asp Thr Leu Thr Thr Arg Ser Gly Thr Glu Val Gly Pro Gin Leu His 
305 310 315 320 

Val Arg Tyr Ser Leu Ser Thr lie His Glu Asp Val Glu Ala lie Pro 
325 330 335 

Tyr Thr Val Asp Glu Asn Thr Gin Gly Leu Ala Phe Val Pro Leu His 
340 345 350 

Glu Glu Gin Ala Gly Phe Glu Glu lie Glu Leu Asp Asp Phe Ser Glu 
355 360 365 

Thr His Arg Leu Leu Pro Gin Asn Thr Ser Ser Thr Pro Val Gly Ser 
370 375 380 

Gly Val Arg Arg Ser Leu lie Pro Thr Arg Glu Phe Ser Ala Thr Arg 
385 390 395 400 

Pro Thr Gly Val Val Thr Tyr Gly Ser Pro Asp Thr Tyr Ser Ala Ser 
405 410 415 



SUBSTITUTE SHEET (RULE 26) 



wo 99/02694 



xviii 



PCT/AU98/00530 



Pro Val Thr Asp Pro Asp Ser Thr 
420 

Thr Thr Thr Thr Pro lie lie lie 
435 440 

Tyr Ser Ser Asn Tyr Thr Leu His 
450 455 

Lys Arg Lys His Ala 
465 



Ser Pro Ser Leu Val lie Asp Asp 
425 430 

lie Asp Gly His Thr Val Asp Leu 
445 

Pro Ser Leu Leu Arg Lys Arg Lys 
460 



<210> 9 
<211> 717 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Aequorea 
victoria gfp gene (humanized) 

<220> 

<221> CDS 

<222> (1) . . (717) 

<400> 9 

atg age aag ggc gag gaa ctg ttc act ggc gtg gtc cca att etc gtg 48 

Met Ser Lys Gly Glu Glu Leu Phe Thr Gly Val Val Pro He Leu Val 
15 10 15 

gaa ctg gat ggc gat gtg aat ggg cac aaa ttt tct gtc age gga gag 96 
Glu Leu Asp Gly Asp Val Asn Gly His Lys Phe Ser Val Ser Gly Glu 
20 25 30 

ggt gaa ggt gat gcc aca tac gga aag etc ace ctg aaa ttc ate tgc 144 
Gly Glu Gly Asp Ala Thr Tyr Gly Lys Leu Thr Leu Lys Phe He Cys 
35 40 45 

acc act gga aag etc cct gtg cca tgg cca aca ctg gtc act ace ttc 192 
Thr Thr Gly Lys Leu Pro Val Pro Trp Pro Thr Leu Val Thr Thr Phe 
50 55 60 

tct tat ggc gtg cag tgc ttt tec aga tac cca gac cat atg aag cag 240 
Ser Tyr Gly Val Gin Cys Phe Ser Arg Tyr Pro Asp His Met Lys Gin 
65 70 75 80 

cat gac ttt ttc aag age gcc atg ccc gag ggc tat gtg cag gag aga 288 
His Asp Phe Phe Lys Ser Ala Met Pro Glu Gly Tyr Val Gin Glu Arg 
85 90 95 

acc ate ttt ttc aaa gat gac ggg aac tac aag acc cgc get gaa gtc 336 
Thr He Phe Phe Lys Asp Asp Gly Asn Tyr Lys Thr Arg Ala Glu Val 
100 105 110 

aag ttc gaa ggt gac acc ctg gtg aat aga ate gag ctg aag ggc att 384 
Lys Phe Glu Gly Asp Thr Leu Val Asn Arg He Glu Leu Lys Gly He 
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115 120 125 

gac ttt aag gag gat gga aac att etc ggc cac aag ctg gaa tac aac 432 
Asp Phe Lys Glu Asp Gly Asn lie Leu Gly His Lys Leu Glu Tyr Asn 
130 135 140 

tat aac tec cac aat gtg tac ate atg gee gac aag caa aag aat ggc 480 
Tyr Asn Ser His Asn Val Tyr lie Met Ala Asp Lys Gin Lys Asn Gly 
145 150 155 160 



ate aag gtc aac ttc aag ate aga cac aac att gag gat gga tec gtg 
lie Lys Val Asn Phe Lys lie Arg His Asn lie Glu Asp Gly Ser Val 



165 170 175 



528 



cag ctg gee gac cat tat caa cag aac act eca ate ggc gac ggc cet 576 
Gin Leu Ala Asp His Tyr Gin Gin Asn Thr Pro lie Gly Asp Gly Pro 
180 185 190 

gtg etc etc cca gac aac cat tac ctg tec ace cag tct gee ctg tct 624 
Val Leu Leu Pro Asp Asn His Tyr Leu Ser Thr Gin Ser Ala Leu Ser 
195 200 205 

aaa gat cec aac gaa aag aga gac cac atg gtc ctg ctg gag ttt gtg 672 
Lys Asp Pro Asn Glu Lys Arg Asp His Met Val Leu Leu Glu Phe Val 
210 215 220 

ace get get ggg ate aca cat ggc atg gac gag ctg tac aag tga 717 
Thr Ala Ala Gly lie Thr His Gly Met Asp Glu Leu Tyr Lys 
225 230 235 



<210> 10 
<211> 238 
<212> PRT 

<213> Artificial Sequence 
<400> 10 

Met Ser Lys Gly Glu Glu Leu Phe Thr Gly Val Val Pro lie Leu Val 
15 10 15 

Glu Leu Asp Gly Asp Val Asn Gly His Lys Phe Ser Val Ser Gly Glu 
20 25 30 

Gly Glu Gly Asp Ala Thr Tyr Gly Lys Leu Thr Leu Lys Phe lie Cys 
35 40 45 

Thr Thr Gly Lys Leu Pro Val Pro Trp Pro Thr Leu Val Thr Thr Phe 
50 55 60 

Ser Tyr Gly Val Gin Cys Phe Ser Arg Tyr Pro Asp His Met Lys Gin 
65 70 75 80 

His Asp Phe Phe Lys Ser Ala Met Pro Glu Gly Tyr Val Gin Glu Arg 
85 90 95 

Thr lie Phe Phe Lys Asp Asp Gly Asn Tyr Lys Thr Arg Ala Glu Val 
100 105 110 
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Lys Phe Glu Gly Asp Thr Leu Val Asn Arg lie Glu Leu Lys Gly lie 
115 120 125 

Asp Phe Lys Glu Asp Gly Asn lie Leu Gly His Lys Leu Glu Tyr Asn 
130 135 140 

Tyr Asn Ser His Asn Val Tyr lie Met Ala Asp Lys Gin Lys Asn Gly 
145 150 155 160 

He Lys Val Asn Phe Lys He Arg His Asn He Glu Asp Gly Ser Val 
165 170 175 

Gin Leu Ala Asp His Tyr Gin Gin Asn Thr Pro He Gly Asp Gly Pro 
180 185 190 

Val Leu Leu Pro Asp Asn His Tyr Leu Ser Thr Gin Ser Ala Leu Ser 
195 200 205 

Lys Asp Pro Asn Glu Lys Arg Asp His Met Val Leu Leu Glu Phe Val 
210 215 220 

Thr Ala Ala Gly He Thr His Gly Met Asp Glu Leu Tyr Lys 
225 230 235 



<210> 11 
<211> 717 
<212> DNA 

<213> Artificial Sequence 

<220> 

<221> CDS 

<222> (1) . . (717) 

<220> 

<223> Description of Artificial Sequence: Synthetic gfp 
gene (Papillomavirusized) 

<220> 

<223> Codons of humanized gfp gene replaced with 
synonymous codons used at relatively high 
frequency by papillomavirus genes 

<400> 11 

atg agt aaa ggg gaa gaa eta ttt aca ggg gtg gtg cct ata eta gtg 48 

Met Ser Lys Gly Glu Glu Leu Phe Thr Gly Val Val Pro He Leu Val 
15 10 15 

gaa eta gat ggg gat gtg aat ggg cac aaa ttt tct gtc agt ggg gaa 96 
Glu Leu Asp Gly Asp Val Asn Gly His Lys Phe Ser Val Ser Gly Glu 
20 25 30 

ggg gaa ggg gat gca aca tat ggg aaa eta aca eta aaa ttt ata tge 144 
Gly Glu Gly Asp Ala Thr Tyr Gly Lys Leu Thr Leu Lys Phe He Cys 
35 40 45 
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aca aca ggg aaa eta cct gtg cca tgg cct aca eta gtg aca aca ttt 192 
Thr Thr Gly Lys Leu Pro Val Pro Trp Pro Thr Leu Val Thr Thr Phe 
50 55 60 

agt tat ggg gtg caa tgc ttt agt aga tat cct gat cat atg aaa caa 240 
Ser Tyr Gly Val Gin Cys Phe Ser Arg Tyr Pro Asp His Met Lys Gin 
65 70 75 80 

cat gat ttt ttt aaa agt gca atg ccc gag ggg tat gtg caa gaa aga 288 
His Asp Phe Phe Lys Ser Ala Met Pro Glu Gly Tyr Val Gin Glu Arg 
85 90 95 

aca ata ttt ttt aaa gat gat ggg aat tat aaa aca aga gca gaa gtc 336 
Thr lie Phe Phe Lys Asp Asp Gly Asn Tyr Lys Thr Arg Ala Glu Val 
100 105 110 

aaa ttt gaa ggg gat aca eta gtg aat aga ata gag etc aaa ggg ata 384 
Lys Phe Glu Gly Asp Thr Leu Val Asn Arg lie Glu Leu Lys Gly lie 
115 120 125 

gat ttt aaa gaa gat ggg aat ata eta ggg cat aaa eta gaa tat aat 432 
Asp Phe Lys Glu Asp Gly Asn lie Leu Gly His Lys Leu Glu Tyr Asn 
130 135 140 

tat aat agt cat aat gtg tat ata atg gca gat aaa caa aaa aat ggg 480 
Tyr Asn Ser His Asn Val Tyr lie Met Ala Asp Lys Gin Lys Asn Gly 
145 150 155 160 

ata aaa gtg aat ttt aaa ata ata aga cat ata gaa gat gga tec gtg 528 
lie Lys Val Asn Phe Lys lie lie Arg His lie Glu Asp Gly Ser Val 
165 170 175 

caa eta gca gat cat tat caa caa aat aca cct ata ggg gat ggg cct 576 
Gin Leu Ala Asp His Tyr Gin Gin Asn Thr Pro lie Gly Asp Gly Pro 
180 185 , 190 

gtg eta eta cct gat aac cat tat eta agt aca caa agt gca eta agt 624 
Val Leu Leu Pro Asp Asn His Tyr Leu Ser Thr Gin Ser Ala Leu Ser 
195 200 205 

aaa gat cct aat gaa aaa aga gat cat atg gtg eta etc gag ttt gtg 672 
Lys Asp Pro Asn Glu Lys Arg Asp His Met Val Leu Leu Glu Phe Val 
210 215 220 

aca gca gca ggg ata aca eat ggg atg gat gaa eta tat aaa tga 717 
Thr Ala Ala Gly lie Thr His Gly Met Asp Glu Leu Tyr Lys 
225 230 235 



<210> 12 
<211> 238 
<212> PRT 

<213> Artificial Sequence 
<400> 12 

Met Ser Lys Gly Glu Glu Leu Phe Thr Gly Val Val Pro lie Leu Val 
15 10 15 
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Glu Leu Asp Gly Asp Val Asn Gly His Lys Phe Ser Val Ser Gly Glu 
20 25 30 

Gly Glu Gly Asp Ala Thr Tyr Gly Lys Leu Thr Leu Lys Phe lie Cys 
35 40 45 

Thr Thr Gly Lys Leu Pro Val Pro Trp Pro Thr Leu Val Thr Thr Phe 
. 50 55 60 

Ser Tyr Gly Val Gin Cys Phe Ser Arg Tyr Pro Asp His Met Lys Gin 
65 70 75 80 

His Asp Phe Phe Lys Ser Ala Met Pro Glu Gly Tyr Val Gin Glu Arg 
85 90 95 

Thr He Phe Phe Lys Asp Asp Gly Asn Tyr Lys Thr Arg Ala Glu Val 
100 105 110 

Lys Phe Glu Gly Asp Thr Leu Val Asn Arg He Glu Leu Lys Gly He 
115 120 125 

Asp Phe Lys Glu Asp Gly Asn He Leu Gly His Lys Leu Glu Tyr Asn 
130 135 140 

Tyr Asn Ser His Asn Val Tyr He Met Ala Asp Lys Gin Lys Asn Gly 
145 150 155 160 

He Lys Val Asn Phe Lys He He Arg His He Glu Asp Gly Ser Val 
165 170 175 

Gin Leu Ala Asp His Tyr Gin Gin Asn Thr Pro He Gly Asp Gly Pro 
180 185 190 

Val Leu Leu Pro Asp Asn His Tyr Leu Ser Thr Gin Ser Ala Leu Ser 
195 200 205 

Lys Asp Pro Asn Glu Lys Arg Asp His Met Val Leu Leu Glu Phe Val 
210 215 220 

Thr Ala Ala Gly He Thr His Gly Met Asp Glu Leu Tyr Lys 
225 230 235 



<210> 13 

<211> 17 

<212> DNA 

<213> Artificial Sequence 



<220> 
<223> 



Description of Artificial Sequence: 
Oligonucleotide specific for Ala(GCA) 



<400> 13 

taaggactgt aagactt 



17 



SUBSTITUTE SHEET (RULE 26) 



wo 99/02694 



xxiii 



PCT/AU98/00S30 



<210> 14 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for Arg(CGA) 

<400> 14 

cgagccagcc aggagtc 17 



<210> 15 

<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 

oligonucleotide specific for Asn(AAC) 



<210> 16 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for Asp{GAC) 

<400> 16 

taagatatat agattat 17 



<210> 17 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for Cys(TGC) 



<400> 15 

ctagattggc aggaatt 



17 



<400> 17 

aagtcttagt agagatt 



17 



<210> 18 
<211> 17 
<212> DNA 



<213> Artificial Sequence 
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<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for Glu(GAA) 

<400> 18 

tatttctaca cagcatt 17 



<210> 19 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for Gln(CAA) 

<400> 19 

ctaggacaat aggaatt 17 



<210> 20 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for Gly(GGA) 

<400> 20 

tactctcttc tgggttt 17 



<210> 21 

<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for His (CAC) 

<400> 21 

tgccgtgact cggattc 17 



<210> 22 

<211> 17 

<212> DNA 

<213> Artificial 



Sequence 



<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for Ile(ATC) 

<400> 22 
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XXV 



tagaaataag agggctt 



17 



<210> 23 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Secjuence: 

Oligonucleotide specific for Leu(CTA) 

<400> 23 

tacttttatt tggattt 17 



<210> 24 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for Leu(CTT) 



<210> 25 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for Lys (AAA) 

<400> 25 

tcactatgga gatttta 17 



<210> 26 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for Lys (AAG) 



<400> 24 

tattagggag aggattt 



17 



<400> 26 

cgcccaacgt ggggctc 



17 



<210> 27 
<211> 17 
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<212> DNA 

<213> Artificial Sequence 



<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for Met (elong) 

<400> 27 

tagtacggga aggattt 17 



<210> 28 

<211> 17 

<212> DNA 

<213> Artificial 



Sequence 



<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for Phe (TTC) 

<400> 28 

tgtttatggg atacaat 17 



<210> 29 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for Pro (CCA) 

<400> 29 

tcaagaagaa ggagcta 17 



<210> 30 

<211> 17 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for Pro(CCI) 

<400> 30 

gggctcgtcc gggattt 17 



<210> 31 

<211> 17 

<212> DNA 

<213> Artificial 



Sequence 



<220> 

<223> Description of Artificial Sequence: 
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Oligonucleotide specific for Ser(AGC) 



<400> 31 

ataagaaagg aagatcg 



17 



<210> 32 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for Thr(ACA) 

<400> 32 

tgtcttgaga agagaag 17 



<210> 33 
<211> 17 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for Tyr(TAC) 



<210> 34 
<211> 17 
<212> DNA 

<213> Artificial Sec[uence 
<220> 

<223> Description of Artificial Sequence: 

Oligonucleotide specific for Val(GTA) 

<400> 34 

tcagagtgtt cattggt 17 



<400> 33 

tggtaaaaag aggattt 



17 
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